Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify grype DB access abstractions #2132

Open
5 of 8 tasks
wagoodman opened this issue Sep 17, 2024 · 0 comments · May be fixed by #2311
Open
5 of 8 tasks

Simplify grype DB access abstractions #2132

wagoodman opened this issue Sep 17, 2024 · 0 comments · May be fixed by #2311
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@wagoodman
Copy link
Contributor

wagoodman commented Sep 17, 2024

Here are the rough steps today with v1-5 to get a match from the DB, starting within a matcher:

  1. Matchers use the search package Criteria to access the given vulnerability.Provider , where the provider is DB schema agnostic and passed to the matcher
  2. Search package static helper functions query and refine the final set of vulnerability candidates, using the given vulnerability.Provider
  3. The vulnerability.Provider then uses the DB specific store reader to find vulnerabilities, and return normalized vulnerability objects (agnostic to DB schema)
  4. The store reader accesses the underlying sqlite store directly and raises up DB specific vulnerability objects

Some observations out of this are:

  • The search functions are static and require continually passing a provider
    • In v6 we improve on this by requiring the matcher to instantiate a client with specific configuration and access to the store at construction, and the client provides the raw search functionalities
  • The provider is DB agnostic, which requires that all data is fully deserialized from the DB (even if it’s not needed)
    • In v6 we improve on this by searching against indexed tables and only deserialize related blobs if they are needed beyond a specific step

Changes

(feel free to browse the prototype)

The search package should be where DB model deserialization occurs to leverage as many optimizations as possible while searching. This would remove some unnecessary abstractions (the vulnerability.Provider).

Matchers search by criteria against a client (where the client is driven by search criteria and is passed into the matcher at matcher construction)

Motivating example (not finalized)
// from within the search package

type Resources struct {
	Store             v6.StoreReader
	AttributedMatcher match.MatcherType
}

type Criteria func(Resources) ([]match.Match, error)

type Interface interface {
	GetMetadata(id, namespace string) (*vulnerability.Metadata, error)
	ByCriteria(criteria ...Criteria) ([]match.Match, error)
}

type Client struct {
	resources Resources
}

func NewClient(store v6.StoreReader, matcherType match.MatcherType) *Client {
	return &Client{
		resources: Resources{
			Store:             store,
			AttributedMatcher: matcherType,
		},
	}
}

func (c Client) ByCriteria(criteria ...Criteria) ([]match.Match, error) {
	var matches []match.Match
	for _, criterion := range criteria {
		m, err := criterion(c.resources)
		if err != nil {
			return nil, err
		}
            // TODO: add matcher type to all matches...
		matches = append(matches, m...)
	}
	return matches, nil
}
Example search criteria function
// from within the search package

func ByCPE(p pkg.Package) Criteria {
	return func(r Resources) ([]match.Match, error) {
           // use db v6 specific indexes to raise matches -- r.store.Get*()
           // use common functions like onlyVulnerableMatches(), etc., 
           // to account for platform CPE, version filtering, etc. 
	}
}

This allows the matcher to implement their own custom criteria but also use common criteria.

Note from the above example that we’re able to get raw DB models, but we’re still getting it from a store object that is tailored to know how to access the object efficiently.

A motivating example (not final)
// affected_package_store from within the db/v6 package

type AffectedPackageStoreWriter interface {
	AddAffectedPackages(packages ...*AffectedPackageHandle) error
}

type AffectedPackageStoreReader interface {
	GetPackageByNameAndDistro(packageName, distroName, majorVersion string, minorVersion *string) ([]AffectedPackageHandle, error)
}

type affectedPackageStore struct {
	*StoreConfig
	*state
	blobStore *blobStore
}

func newAffectedPackageStore(cfg *StoreConfig, bs *blobStore) *affectedPackageStore {
	return &affectedPackageStore{
		StoreConfig: cfg,
		state:       cfg.state(),
		blobStore:   bs,
	}
}

func (s *affectedPackageStore) AddAffectedPackages(packages ...*AffectedPackageHandle) error {
	for _, v := range packages {
		if v.Package != nil {
			var existingPackage Package
			result := s.db.Where("name = ? AND type = ?", v.Package.Name, v.Package.Type).FirstOrCreate(&existingPackage, v.Package)
			if result.Error != nil {
				return fmt.Errorf("failed to create package (name=%q type=%q): %w", v.Package.Name, v.Package.Type, result.Error)
			} else {
				v.Package = &existingPackage
			}
		}

		if err := s.blobStore.AddAffectedPackageBlob(v); err != nil {
			return fmt.Errorf("unable to add affected blob: %w", err)
		}
		if err := s.db.Create(v).Error; err != nil {
			return err
		}
	}
	return nil
}


func (s *affectedPackageStore) GetPackageByNameAndDistro(packageName, distroName, majorVersion string, minorVersion *string) ([]AffectedPackageHandle, error) {
	version := majorVersion
	if minorVersion != nil {
		version = majorVersion + "." + *minorVersion
	}
	log.WithFields("name", packageName, "distro", distroName+"@"+version).Trace("fetching Package record")

	var pkgs []AffectedPackageHandle
	query := s.db.Where("package_name = ? AND operating_system.name = ? AND operating_system.major_version = ?", packageName, distroName, majorVersion)

	if minorVersion != nil {
		query = query.Where("operating_system.minor_version = ?", *minorVersion)
	} else {
		query = query.Where("operating_system.minor_version = null")
	}
	result := query.Joins("OperatingSystem").Find(&pkgs)
	if result.Error != nil {
		return nil, result.Error
	}
	return pkgs, nil
}

Which each shard of the store accumulates to a full store object (reader and writer too):

Example (again, not finalized)
// store.go within the db/v6 package

const vulnerabilityStoreFileName = "vulnerability.db"

type Store interface {
	StoreReader
	StoreWriter
}

type StoreReader interface {
	AffectedPackageStoreReader
	AffectedCPEStoreReader
	VulnerabilityStoreReader
	ProviderStoreReader
}

type StoreWriter interface {
	AffectedPackageStoreWriter
	AffectedCPEStoreWriter
	VulnerabilityStoreWriter
	ProviderStoreWriter
	io.Closer
}

type store struct {
	*affectedPackageStore
	*vulnerabilityStore
	*affectedCPEStore
	*providerStore
	cfg *StoreConfig
}

func New(cfg StoreConfig) (Store, error) {
	bs := newBlobStore(&cfg)
	return &store{
		cfg:                  &cfg,
		affectedPackageStore: newAffectedPackageStore(&cfg, bs),
		affectedCPEStore:     newAffectedCPEStore(&cfg, bs),
		vulnerabilityStore:   newVulnerabilityStore(&cfg, bs),
		providerStore:        newProviderStore(&cfg),
	}, nil
}

A store implementation is provided for all available database objects, embedded into the final Store interface.

The DB search client queries and refines the final set of vulnerability candidates (using the injected DB-specific store reader into the client) . The search methods access the DB with the raw sqlite models, including the ability to optionally fetch associated blob values or not -- this deferral is critical to performance gains.

This implies the following incremental additions, each with ways to read and write entries to and from the DB:

This implies that a new search client needs to be implemented with existing (common) criteria:

  • Add ByCPECriteria
  • Add ByLanguageCriteria
  • Add ByDistroCriteria

Ideally all of these changes are done incrementally and do not affect the existing v5 implementation. We should only remove the v5 implementation when we are ready to cutover to v6. This also implies that we should consider making the search client a shared concern but the criteria implemented within each db schema -- this is still open to design/options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants