Skip to content
Jin Wang edited this page Mar 27, 2015 · 1 revision

Overview

This is a description of the options available in the configure.ocelot file that is used to change the default options passed to a program linked against ocelot. This file is currently written in JSON, and is parsed by ocelot when you launch a program that is linked against ocelot.

Caveats

Ocelot will search for the configure.ocelot file in the current working directory. If it is not found it will load default values for all options.

Details

Available Options

File Scope

  • ocelot - Specifies that this is a configuration file for Ocelot. Currently not used.
  • version - Specifies the minimum version of Ocelot required. Currently not used.

trace

These options control trace generators in Ocelot. Enabling trace generators will attach them to kernels that execute on the emulator device.

  • enabled - Enable the use of trace generators.
  • database - Path to store a database to track all traces produced by ocelot.
  • memory - Enable the MemoryTraceGenerator
  • branch - Enable the BranchTraceGenerator
  • sharedComputation - Enable the SharedComputationTraceGenerator
  • instruction - Enable the InstructionTraceGenerator
  • parallelism - Enable the ParallelismTraceGenerator
  • memoryChecker - Enable the Ocelot memory checker (catches out of bounds, and unaligned memory accesses)
  • raceDetector - Enable the Ocelot race detector (catches data races to shared memory)
  • warpSynchronous
    • enable - Enable the WarpSynchronousTraceGenerator
    • emitHotPaths - Emit a DOT file depicting the control flow graph with basic blocks color coded by how often they were executed
  • performanceBound
    • enable - Enable the PerformanceBoundTraceGenerator
    • protocol - Determine the memory coalescing protocol for determining how many transactions each memory access requires
    • output - Specifies which output format to use ("dot" or "csv")
  • cacheSimulator
    • enable - Enable the CacheSimulator trace generator
    • writebackTime - How many cycles per dirty writeback
    • cacheSize - Total cache size in bytes
    • lineSize - Line size in bytes
    • hitTime - Cycles per hit
    • missTime - Cycles per miss
    • associativity - Cache associativity
    • instructionCache - Cache instruction memory rather than data memory
  • convergence
    • enabled - Enable the ConvergenceTraceGenerator
    • logfile - Where to store the log file
    • dot - Produce a DOT graph for each kernel
    • render - Render the graph (not implemented)

cuda

These options are used to configure the CUDA Runtime component in Ocelot.

  • implementation - Which version of the runtime to use (CudaRuntime, TraceGeneratingCudaRuntime, RemoteCudaRuntime)
  • runtimeApiTrace - Where to store traces of calls when using the TraceGeneratingCudaRuntime

executive

These options consider the executive portion of ocelot. The executive is a low level hardware abstraction layer that is used to manage Ocelot backend devices.

  • devices - Enable the following backends in Ocelot (llvm, emulated, nvidia, amd)
  • optimizationLevel - Optimization level to apply to kernels (different meanings for different backends)
    • none - No optimizations
    • basic - Most common optimizations
    • space - Try to reduce code size
    • report - Insert very verbose callbacks in the generated code
    • memcheck - Guard all memory accesses with bounds checks
    • debug - Memcheck and report at the same time
    • aggressive - All optimizations that do not increase code size
    • full - All optimizations, even those that increase code size
  • workerThreadLimit - Limit the number of host threads that Ocelot backends are allowed to use.

An Example

This is an example configure.ocelot file:

{
	ocelot: "ocelot",
	trace: { 
		database: "traces/database.trace",
		memory: true,
		memoryChecker: {
			enabled: true,
			checkInitialization: false
		},
		raceDetector: {
			enabled: true,
			ignoreIrrelevantWrites: true
		},
		debugger: {
			enabled: false,
			kernelFilter: "_Z13scalarProdGPUPfS_S_ii",
			alwaysAttach: true
		}
	},
	cuda: {
		implementation: CudaRuntime,
		runtimeApiTrace: "trace/CudaAPI.trace"
	},
	executive: {
		devices: [ nvidia, emulated, llvm, amd ],
		port: 2011,
		host: "127.0.0.1",
		optimizationLevel: none,
		workerThreadLimit: 2,
		warpSize: 4
	},
	optimizations: {
		subkernelSize: 60
	}
}
Clone this wiki locally