Implement product and group IDs #8

bernhard-herzog · 2024-12-18T17:36:56Z

The product and group IDs are now generated in way that respects referential integrity, i.e. an ID that is referenced in one part of the document is actually defined in another.

Implements #1

When deriving the template from the schema, some schemas have the same uderlying type, because the both use to refer to the same type. So far, the code correctly used the definition of that shared type, but used different names when putting it into the template's Types map. With this change, that name is derived from the shared underlying type. This leads to more sharing of types which makes it easier to deal with special handling of e.g. the product_id type which is referenced in many places.

During the construction of the tree for the random CSAF document, it can happen that a particular branch cannot be fully constructed, because some constraint cannot be met. The main reason for that so far is that the maximum depth would be exceeded. In this case the generateNode method returns an error that indicates this and the generator will try to recover from this by trying something else, like not generating the object property where the problem occurred and trying to add different one. The implementation of produt IDs will add onother reason for pruning a branch, so we introce an error value that forms the basis for all reasons for pruning. We also now handle these errors in more places, even though those are currently not places where these error can occur for the built-in template.

We can treat those cases like depth exceeded and simply not generate the array at all and let the parent try to do something else.

The product tree contains products that are identified with IDs that may be referenced from elsewhere in the document. Those references must be generated in such a way that they point to existing products. With this commit the can do that, although at the moment it doesn't work well enough yet. In about 2.5% of all attempts, no document will be generated at all. The basic approach is to have two new template types, one to generate IDs and another for places that reference them. During generation references are only created at all if at least one ID has been generated and they're initially added with a placeholder. Once the full tree has been generated, the placeholders are filled with IDs randomly chosen from all that were generated. Because this will have to be done with other kinds of IDs, such as group IDs, the new template types have a namespace parameter is used to distinguish them.

When generating group_ids we need to be able to generate arrays of with at least two product_id references all of which have to be different. With the current approach that doesn't work because the randomArray method ends up instantiating TmplRef a bunch of times, but all the reference values generated are equal as they don't have an actual value yet, and because they're equal randomArray rejects all but the first one because of the uniqueness requirement and then the array is too short because it must contain at least two values. The solution chosen here is to add a special case to randomArray: if the items are TmplRef and there's a unique constraint we add a placeholde that represents an array of distinct references which are filled in at the end like the other references. This new kind of reference is represented by the same struct type as the other reference placeholder, but it now has an dditional length field that is used to distinguish the cases (see the comments for the reference struct).

When generating product groups, we need to generate the product_ids property before the group_id property. Group IDs are defined by the product_groups items. If generating the product_ids for a group fails because e.g. there are no product_ids the entire product group fails and will not be in the document. If we had already generated a group_id for the group the generator would have a known group ID and therefore happily generate references to it elsewere in et document, but the document would not actually define it leading to dangling references. The dependency between properties allows us to indicate that group_id depends on product_ids and therefore the latter must be attempted first.

The group IDs are now handled basically in the same way as the product IDs. The built-in template is automatically modified with group ID specific settings that mostly work like the ones for the product IDs, with one exception, the group IDs use the dependency mechanism between properties to make sure that in product_group objects the group ID is only generated when there are product IDs available.

The required and depends fields of properties can be omitted if they have default values. Since they have the defaults for many properties, they're omitted from the TOML serialization of the built-in template.

The depends attribute for properties is a work around for the problem that fakedoc could generate references to non existing group IDs, which happened because once an ID was generated it couldn't be removed from the name space even if the branch for which it was generated had been abandoned during generation for e.g. lack of product IDs. See 6a957d1 for more details. As it turns out, there's an easy way of removing IDs generated in abandoned branches¹: We take a snapshot of the generator's name space state before attempting to create a branch (basically the entirety of generateNode), attempt to create the branch, and if it fails because of an error based on ErrBranchAbandoned we restore the generator's namespace state to the snapshot. The key insight for why this works, is that if during the attempt to generate the branch, any group IDs have been created then any references that might have been created in that branch because of the existence of those IDs are also in that branch. So restoring the snapshot removes all the consequences of ID generation that happened in the attempt. So, forcing the order in which some properties are generated is not needed any more. It was a work around that only worked in highly specific circumstances (it was introduced specifically for product groups) and is hard to explain to users who want to modify the templates because it requires somewhat detailed knowlege about how fakedoc works. The new approach works better and doesn't need to be understood by users for IDs to work. ¹ branch in the generic tree sense, not necessarily CSAF branches_t

This was only introduced to work around problems with group IDs which have been solved in a differen way with 6b3947f

bernhard-herzog added 13 commits December 6, 2024 17:02

Make fromSchema return the type name so that callers don't have to guess

f5db2f2

Better handling of generation failures of arrays with uniqueitems=true

28fb6c3

We can treat those cases like depth exceeded and simply not generate the array at all and let the parent try to do something else.

Omit optional fields with default values from properties in TOML

3b2b050

The required and depends fields of properties can be omitted if they have default values. Since they have the defaults for many properties, they're omitted from the TOML serialization of the built-in template.

Remove Property's Depends field and related code

192e6bd

This was only introduced to work around problems with group IDs which have been solved in a differen way with 6b3947f

Add documentation for id and ref

35506b5

Rename local variable to avoid shadowing a built-in

4963541

bernhard-herzog requested a review from s-l-teichmann December 18, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement product and group IDs #8

Implement product and group IDs #8

bernhard-herzog commented Dec 18, 2024

Implement product and group IDs #8

Are you sure you want to change the base?

Implement product and group IDs #8

Conversation

bernhard-herzog commented Dec 18, 2024