Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement product and group IDs #8

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Implement product and group IDs #8

wants to merge 13 commits into from

Conversation

bernhard-herzog
Copy link
Contributor

The product and group IDs are now generated in way that respects referential integrity, i.e. an ID that is referenced in one part of the document is actually defined in another.

Implements #1

When deriving the template from the schema, some schemas have the same
uderlying type, because the both use  to refer to the same type.
So far, the code correctly used the definition of that shared type, but
used different names when putting it into the template's Types map. With
this change, that name is derived from the shared underlying type.

This leads to more sharing of types which makes it easier to deal with
special handling of e.g. the product_id type which is referenced in many
places.
During the construction of the tree for the random CSAF document, it can
happen that a particular branch cannot be fully constructed, because
some constraint cannot be met. The main reason for that so far is that
the maximum depth would be exceeded. In this case the generateNode
method returns an error that indicates this and the generator will try
to recover from this by trying something else, like not generating the
object property where the problem occurred and trying to add different
one.

The implementation of produt IDs will add onother reason for pruning a
branch, so we introce an error value that forms the basis for all
reasons for pruning.

We also now handle these errors in more places, even though those are
currently not places where these error can occur for the built-in
template.
We can treat those cases like depth exceeded and simply not generate the
array at all and let the parent try to do something else.
The product tree contains products that are identified with IDs that may
be referenced from elsewhere in the document. Those references must be
generated in such a way that they point to existing products. With this
commit the can do that, although at the moment it doesn't work well
enough yet. In about 2.5% of all attempts, no document will be generated
at all.

The basic approach is to have two new template types, one to generate
IDs and another for places that reference them. During generation
references are only created at all if at least one ID has been generated
and they're initially added with a placeholder. Once the full tree has
been generated, the placeholders are filled with IDs randomly chosen
from all that were generated.

Because this will have to be done with other kinds of IDs, such as group
IDs, the new template types have a namespace parameter is used to
distinguish them.
When generating group_ids we need to be able to generate arrays of with
at least two product_id references all of which have to be different.

With the current approach that doesn't work because the randomArray
method ends up instantiating TmplRef a bunch of times, but all the
reference values generated are equal as they don't have an actual value
yet, and because they're equal randomArray rejects all but the first one
because of the uniqueness requirement and then the array is too short
because it must contain at least two values.

The solution chosen here is to add a special case to randomArray: if the
items are TmplRef and there's a unique constraint we add a placeholde
that represents an array of distinct references which are filled in at
the end like the other references. This new kind of reference is
represented by the same struct type as the other reference placeholder,
but it now has an dditional length field that is used to distinguish the
cases (see the comments for the reference struct).
When generating product groups, we need to generate the product_ids
property before the group_id property. Group IDs are defined by the
product_groups items. If generating the product_ids for a group fails
because e.g. there are no product_ids the entire product group fails and
will not be in the document. If we had already generated a group_id for
the group the generator would have a known group ID and therefore
happily generate references to it elsewere in et document, but the
document would not actually define it leading to dangling references.

The dependency between properties allows us to indicate that group_id
depends on product_ids and therefore the latter must be attempted first.
The group IDs are now handled basically in the same way as the product
IDs. The built-in template is automatically modified with group ID
specific settings that mostly work like the ones for the product IDs,
with one exception, the group IDs use the dependency mechanism between
properties to make sure that in product_group objects the group ID is
only generated when there are product IDs available.
The required and depends fields of properties can be omitted if they
have default values. Since they have the defaults for many properties,
they're omitted from the TOML serialization of the built-in template.
The depends attribute for properties is a work around for the problem
that fakedoc could generate references to non existing group IDs, which
happened because once an ID was generated it couldn't be removed from
the name space even if the branch for which it was generated had been
abandoned during generation for e.g. lack of product IDs. See 6a957d1
for more details.

As it turns out, there's an easy way of removing IDs generated in
abandoned branches¹: We take a snapshot of the generator's name space
state before attempting to create a branch (basically the entirety of
generateNode), attempt to create the branch, and if it fails because of
an error based on ErrBranchAbandoned we restore the generator's
namespace state to the snapshot.

The key insight for why this works, is that if during the attempt to
generate the branch, any group IDs have been created then any references
that might have been created in that branch because of the existence of
those IDs are also in that branch. So restoring the snapshot removes all
the consequences of ID generation that happened in the attempt.

So, forcing the order in which some properties are generated is not
needed any more. It was a work around that only worked in highly
specific circumstances (it was introduced specifically for product
groups) and is hard to explain to users who want to modify the templates
because it requires somewhat detailed knowlege about how fakedoc works.
The new approach works better and doesn't need to be understood by users
for IDs to work.

¹ branch in the generic tree sense, not necessarily CSAF branches_t
This was only introduced to work around problems with group IDs which
have been solved in a differen way with 6b3947f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant