Skip to content

Commit

Permalink
Dedup: hash modules back->front (#7820)
Browse files Browse the repository at this point in the history
Hashing modules back->front lets us be a bit leaner with the state we
have to track, and hopefully will give us a nice speed improvement.

In the old algorithm, as we hashed each operation or block in the
module, we would store its position as an index into a side-table.  When
a value is used, we could record the use by hashing-in the result-no and
the defining operation's index (or argument-no and block index, resp).

This index table is a major performance bottleneck for dedup: in a large
module, this table can be massive. The observation made is that values
tend to only be used near their definition. After we hash the last use
of an operation or block, we should be safe to remove its index from the
index table, and keep the index table as small as possible.

This PR modifies the hasher to walk the module backwards. When a value
is first encountered (while hashing a use/operand), we assign an ID to
the defining operation. We use that ID to hash all future uses.

When the defining op is hashed, we hash its ID once more (recording the
fact that the ID is defined by the op), and remove the ID from the
table--a value can only be used after it is defined. This ensures that
we only track the ID of an operation for its live range in the IR.

The IDs are assigned according to their "first occurrence" in the
backwards walk of the IR. Since the assignment of IDs is derived from
the structure of the IR, two equivalent modules should assign the same
IDs to the same ops.

This PR also updates the hashing of inner-symbols to be handled in a
similar way. When an inner symbol is referenced, we assign an ID and
record the reference by hashing the ID. When an inner symbol is defined,
we record the definition by, again, hashing the ID. Unlike values, a
symbol can be referenced before it is defined, so we can never free
inner symbol IDs. This corrects an old logical "bug" in dedup, which
never arises in practice because chisel cannot generate it.
  • Loading branch information
rwy7 authored Nov 15, 2024
1 parent ac249e5 commit 2d93ce7
Showing 1 changed file with 97 additions and 104 deletions.
201 changes: 97 additions & 104 deletions lib/Dialect/FIRRTL/Transforms/Dedup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,20 +98,6 @@ static bool operator==(const ModuleInfo &lhs, const ModuleInfo &rhs) {
lhs.referredModuleNames == rhs.referredModuleNames;
}

/// Unique identifier for a value. All value sources are numbered by apperance,
/// and values are identified using this numbering (`index`) and an `offset`.
/// For BlockArgument's, this is the argument number.
/// For OpResult's, this is the result number.
struct ValueId {
uint64_t index;
uint64_t offset;
};

struct SymbolTarget {
ValueId index;
uint64_t fieldID;
};

/// This struct contains constant string attributes shared across different
/// threads.
struct StructuralHasherSharedConstants {
Expand Down Expand Up @@ -150,14 +136,56 @@ struct StructuralHasherSharedConstants {

struct StructuralHasher {
explicit StructuralHasher(const StructuralHasherSharedConstants &constants)
: constants(constants){};
: constants(constants) {}

ModuleInfo getModuleInfo(FModuleLike module) {
update(&(*module));
return {sha.final(), std::move(referredModuleNames)};
}

private:
// Get the identifier for an object. The identifier is assigned on first use.
unsigned getID(void *object) {
auto [it, inserted] = idTable.try_emplace(object, nextID);
if (inserted)
++nextID;
return it->second;
}

// Get the identifier for an IR object. Free the ID, too.
unsigned finalizeID(void *object) {
auto it = idTable.find(object);
if (it == idTable.end())
return nextID++;
auto id = it->second;
idTable.erase(it);
return id;
}

unsigned getInnerSymID(StringAttr name) {
auto [it, inserted] = innerSymIDTable.try_emplace(name, nextInnerSymID);
if (inserted)
++nextInnerSymID;
return it->second;
}

void update(OpOperand &operand) {
auto value = operand.get();
if (auto result = dyn_cast<OpResult>(value)) {
auto *op = result.getOwner();
update(getID(op));
update(result.getResultNumber());
return;
}
if (auto argument = dyn_cast<BlockArgument>(value)) {
auto *block = argument.getOwner();
update(getID(block));
update(argument.getArgNumber());
return;
}
llvm_unreachable("Unknown value type");
}

void update(const void *pointer) {
auto *addr = reinterpret_cast<const uint8_t *>(&pointer);
sha.update(ArrayRef<uint8_t>(addr, sizeof pointer));
Expand Down Expand Up @@ -186,20 +214,6 @@ struct StructuralHasher {
update(type.getAsOpaquePointer());
}

void record(void *address) {
auto size = indices.size();
assert(!indices.contains(address));
indices[address] = size;
}

/// Get the unique id for the specified value.
ValueId getId(Value val) {
if (auto arg = dyn_cast<BlockArgument>(val))
return {indices.at(arg.getOwner()), arg.getArgNumber()};
auto result = cast<OpResult>(val);
return {indices.at(result.getOwner()), result.getResultNumber()};
}

void update(OpResult result) {
// Like instance ops, don't use object ops' result types since they might be
// replaced by dedup. Record the class names and lazily combine their hashes
Expand All @@ -212,39 +226,6 @@ struct StructuralHasher {
update(result.getType());
}

void update(ValueId index) {
update(index.index);
update(index.offset);
}

void update(OpOperand &operand) { update(getId(operand.get())); }

void update(Operation *op, hw::InnerSymAttr attr) {
for (auto props : attr)
innerSymTargets[props.getName()] =
SymbolTarget{{indices.at(op), 0}, props.getFieldID()};
}

void update(Value value, hw::InnerSymAttr attr) {
for (auto props : attr)
innerSymTargets[props.getName()] =
SymbolTarget{getId(value), props.getFieldID()};
}

void update(const SymbolTarget &target) {
update(target.index);
update(target.fieldID);
}

void update(InnerRefAttr attr) {
// We hash the value's index as it apears in the block.
auto it = innerSymTargets.find(attr.getName());
assert(it != innerSymTargets.end() &&
"inner symbol should have been previously hashed");
update(attr.getTypeID());
update(it->second);
}

/// Hash the top level attribute dictionary of the operation. This function
/// has special handling for inner symbols, ports, and referenced modules.
void update(Operation *op, DictionaryAttr dict) {
Expand All @@ -258,6 +239,9 @@ struct StructuralHasher {
if (constants.nonessentialAttributes.contains(name) && !isClassPortNames)
continue;

// Hash the attribute name (an interned pointer).
update(name.getAsOpaquePointer());

// Hash the port types.
if (name == constants.portTypesAttr) {
auto portTypes = cast<ArrayAttr>(value).getAsValueRange<TypeAttr>();
Expand All @@ -273,17 +257,21 @@ struct StructuralHasher {
auto &region = op->getRegion(0);
if (region.getBlocks().empty())
continue;
auto *block = &region.front();
auto syms = cast<ArrayAttr>(value).getAsRange<hw::InnerSymAttr>();
if (syms.empty())
continue;
for (auto [arg, sym] : llvm::zip_equal(block->getArguments(), syms))
update(arg, sym);
for (auto sym : cast<ArrayAttr>(value).getAsRange<hw::InnerSymAttr>()) {
for (auto property : sym) {
update(property.getFieldID());
update(getInnerSymID(property.getName()));
}
}
continue;
}

if (name == constants.innerSymAttr) {
auto innerSym = cast<hw::InnerSymAttr>(value);
update(op, innerSym);
for (auto property : innerSym) {
update(property.getFieldID());
update(getInnerSymID(property.getName()));
}
continue;
}

Expand All @@ -297,17 +285,14 @@ struct StructuralHasher {
continue;
}

// Hash the interned pointer.
update(name.getAsOpaquePointer());

// TODO: properly handle DistinctAttr, including its use in paths.
// See https://github.com/llvm/circt/issues/6583.
if (isa<DistinctAttr>(value))
continue;

// If this is an symbol reference, we need to perform name erasure.
if (auto innerRef = dyn_cast<hw::InnerRefAttr>(value))
update(innerRef);
update(getInnerSymID(innerRef.getName()));
else
update(value.getAsOpaquePointer());
}
Expand All @@ -318,55 +303,60 @@ struct StructuralHasher {
update(name.getAsOpaquePointer());
}

// NOLINTNEXTLINE(misc-no-recursion)
void update(Block *block) {
for (auto &op : llvm::reverse(*block))
update(&op);
for (auto type : block->getArgumentTypes())
update(type);
update(finalizeID(block));
update(position);
++position;
}

// NOLINTNEXTLINE(misc-no-recursion)
void update(Region *region) {
for (auto &block : llvm::reverse(region->getBlocks()))
update(&block);
update(position);
++position;
}

// NOLINTNEXTLINE(misc-no-recursion)
void update(Operation *op) {
if (op->getNumResults())
record(op);
else if (auto innerSym = dyn_cast<hw::InnerSymbolOpInterface>(op))
if (auto attr = innerSym.getInnerSymAttr())
if (!attr.empty())
record(op);
// Hash the regions. We need to make sure an empty region doesn't hash the
// same as no region, so we include the number of regions.
update(op->getNumRegions());
for (auto &region : reverse(op->getRegions()))
update(&region);

update(op->getName());

// Hash the operands.
// Record the uses for later hashing.
for (auto &operand : op->getOpOperands())
update(operand);

// Number the block pointers, for use numbering their arguments.
for (auto &region : op->getRegions())
for (auto &block : region.getBlocks())
record(&block);

// This happens after the numbering above, as it uses blockarg numbering
// for inner symbols.
update(op, op->getAttrDictionary());

// Hash the regions. We need to make sure an empty region doesn't hash the
// same as no region, so we include the number of regions.
update(op->getNumRegions());
for (auto &region : op->getRegions()) {
update(region.getBlocks().size());
for (auto &block : region.getBlocks()) {
update(indices.at(&block));
for (auto argType : block.getArgumentTypes())
update(argType);
for (auto &op : block)
update(&op);
}
}

// Record any op results (types).
for (auto result : op->getResults())
update(result);

// Incorporate the hash of uses we have already built.
update(finalizeID(op));
update(position);
++position;
}

// Every operation and block is assigned a unique id based on their order of
// appearance. All values are uniquely identified using these.
DenseMap<void *, unsigned> indices;
// A map from an operation/block, to its identifier.
DenseMap<void *, unsigned> idTable;
unsigned nextID = 0;

// Track inner symbol name -> target's unique identification.
DenseMap<StringAttr, SymbolTarget> innerSymTargets;
// A map from an inner symbol, to its identifier.
DenseMap<StringAttr, unsigned> innerSymIDTable;
unsigned nextInnerSymID = 0;

// This keeps track of module names in the order of the appearance.
std::vector<StringAttr> referredModuleNames;
Expand All @@ -377,6 +367,9 @@ struct StructuralHasher {
// This is the actual running hash calculation. This is a stateful element
// that should be reinitialized after each hash is produced.
llvm::SHA256 sha;

// The index of the current op. Increment after handling each op.
size_t position = 0;
};

//===----------------------------------------------------------------------===//
Expand Down

0 comments on commit 2d93ce7

Please sign in to comment.