Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update ScannerCapability, enabled_capabilities, HarborSbomRepor… #17

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

zyyw
Copy link
Collaborator

@zyyw zyyw commented Jan 12, 2024

…t and sbom data spec

@zyyw
Copy link
Collaborator Author

zyyw commented Jan 12, 2024

Hi @knqyf263 we have some updates on the scanner spec 1.2. Please help to review it. Thanks!

Summary of change points:

  1. add additional_attributes in ScannerCapability of ScannerCapability returned by sending a request to /metadata. Here the /metadata will also return the supported_media_types for the type of SBOM
  2. add accept_media_types for parameter of enabled_capabilities to indicate accept_media_types for the type of SBOM, when sending a request to /scan
  3. in HarborSbomReport, the media_type and sbom field are wrapped into as an array of SbomData object of data, when sending a request to /scan/{scan_request_id}/report

cc: @wy65701436 , @stonezdj

@zyyw
Copy link
Collaborator Author

zyyw commented Jan 12, 2024

cc: @chlins

Comment on lines 528 to 533
data:
type: array
items:
$ref: '#/components/schemas/SbomData'
additionalProperties: true
description: 'The raw data of the sbom generated by the scanner and its format.'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not easy for us to implement this API. Currently, each report is stored separately. Trivy doesn't generate both SPDX and CycloneDX together. We prefer returning one report at /scan/{scan_request_id}/report. Or it is better to return all reports (vulnerability and SBOM) together.

This suggestion is as follows:

  • Vulnerability and SBOM reports are separately returned
  • SPDX and CycloneDX reports are aggregated.

It looks inconsistent. I'd suggest

  • Vulnerability, SPDX SBOM and CycloneDX SBOM are returned separately.

Or

  • Vulnerability, SPDX SBOM and CycloneDX SBOM are returned together

What do you think?

Copy link

@stonezdj stonezdj Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API doesn't require the backend implementation to store the information with requestID, you could stored each report with the key <requestID>:<media_type>. then you can store whatever vulnerability, sbom/spdx or sbom/cyclonedx

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knqyf263 , the /scan/{scan_request_id}/report is identified by scan_request_id and Accept header (example: application/vnd.security.vulnerability.report; version=1.1 or application/vnd.security.sbom.report+json; version=1.0). That's the reason why SPDX SBOM report and CycloneDX SBOM report are aggregated.
We do understand that "Trivy doesn't generate both SPDX and CycloneDX together". For example, when /scan is requested like below:

{
  "enabled_capabilities": [
    {
      "type": "sbom",
      "produces_mime_types": [
        "application/vnd.security.sbom.report+json; version=1.0"
      ],
      "accept_media_types": [
        "application/spdx+json",
        "application/vnd.cyclonedx+json"
      ]
    }
  ]
}

trivy-adapter may need to run trivy CLI twice passing different parameters to generate different format of SBOM report.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API doesn't require the backend implementation to store the information with requestID, you could stored each report with the key :<media_type>. then you can store whatever vulnerability, sbom/spdx or sbom/cyclonedx

They all have different statuses, like spdx: completed, cyclonedx: in progress, but the API forces scanners to aggregate results. The specification should define what should be returned in that case. There are more considerations to take into account, so it is closer to a new API rather than an extension of the current API.

That's the reason why SPDX SBOM report and CycloneDX SBOM report are aggregated.

We can do that, but it complicates implementation. If you pass the media type to /scan/{scan_request_id}/report, it simply returns one report. Does Harbor benefit from this specification? If you have a big advantage on this API, I'm ok, but it makes implementation more complicated for scanners and brings more considerations as mentioned above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knqyf263 if the /scan request asks for both SPDX sbom and Cyclonedx sbom, like below:

{
  "enabled_capabilities": [
    {
      "type": "sbom",
      "produces_mime_types": [
        "application/vnd.security.sbom.report+json; version=1.0"
      ],
      "accept_media_types": [
        "application/spdx+json",
        "application/vnd.cyclonedx+json"
      ]
    }
  ]
}

and if we pass the Accept-Media-Type header (application/spdx+json or application/vnd.cyclonedx+json) to /scan/{scan_request_id}/report so that it simply returns one report at one http request to /scan/{scan_request_id}/report, there will be one {scan_request_id} used in two requests to /scan/{scan_request_id}/report. Can the one {scan_request_id} be used two times?

Copy link

@knqyf263 knqyf263 Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_request_id is already used multiple times for vulnerabilities and SBOM.

If I understand correctly, the current suggestion is as follows (Option 1). The same scan_request_id is used twice.

  • scan_request_id (e.g. ABCDE12345) + mime_type
    • ABCDE12345 + application/vnd.security.vulnerability.report+json; version=1.1
      • Return one report
    • ABCDE12345 + application/vnd.security.sbom.report+json; version=1.0
      • Return two reports (SPDX and CycloneDX) in one request

Would the following approach (Option 2) not work?

  • scan_request_id (e.g. ABCDE12345) + mime_type (+ media_type)
    • ABCDE12345 + application/vnd.security.vulnerability.report+json; version=1.1
      • Return one report
    • ABCDE12345 + application/vnd.security.sbom.report+json; version=1.0 + application/spdx+json
      • Return one report (SPDX SBOM)
    • ABCDE12345 + application/vnd.security.sbom.report+json; version=1.0 + application/vnd.cyclonedx+json
      • Return one report (CycloneDX SBOM)

The same ID can be used three times.

Or aggregate all reports (Option 3). scan_request_id will be unique.

  • scan_request_id (e.g. ABCDE12345)
    • ABCDE12345
      • Return three reports (vuln, SPDX and CycloneDX)

I'm just curious Option 1 has any advantages over Option 2 and Option 3. Option 2 is simpler from the implementation perspective (it is already done in aquasecurity/harbor-scanner-trivy#422). But again, if you see benefits in Option 1, I'm ok with the approach.

Copy link
Collaborator Author

@zyyw zyyw Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knqyf263 if we go with the Option 2 you mentioned above, but having media_type (application/spdx+json or application/vnd.cyclonedx+json) as optional query parameter in /scan/{scan_request_id}/report, for example /scan/{scan_request_id}/report?media_type=application/spdx+json (the / in application/spdx+json might be encoded), is it okay with you?

  • /scan/{scan_request_id}/report
    • return one report for vulnerabilities
  • /scan/{scan_request_id}/report?media_type=application/spdx+json
    • return one report of SPDX SBOM
  • /scan/{scan_request_id}/report?media_type=application/vnd.cyclonedx+json
    • return one report of CycloneDX

Once you confirm it, i'll update the PR.
Thanks

Copy link

@knqyf263 knqyf263 Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works. I tried to express "optional" by (+ media_type). But all of the above options work. You can decide it. I just tried to understand the pros/cons.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knqyf263 , Thank you for the confirmation! We think your suggestion of Option 2 makes sense and it makes the process of getting scan report request much simpler. Updated the PR accordingly.
Please help to review it. Appreciated!

@zyyw zyyw force-pushed the update-spec-1.2 branch 3 times, most recently from 31db260 to 1cbaa54 Compare January 16, 2024 03:14
wy65701436
wy65701436 previously approved these changes Jan 16, 2024
Copy link
Contributor

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

stonezdj
stonezdj previously approved these changes Jan 16, 2024
}
]
properties:
$ref: "#/components/schemas/ScannerProperties"
description: |
Represents metadata of a Scanner Adapter which allows Harbor to lookup a scanner capable
Represents metadata of a Scanner Adapter which allows Harbor to lookup a scanner capabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capability

Copy link

@knqyf263 knqyf263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

chlins
chlins previously approved these changes Jan 16, 2024
Copy link
Member

@chlins chlins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zyyw zyyw dismissed stale reviews from chlins, stonezdj, and wy65701436 via 776efab January 16, 2024 09:52
@zyyw
Copy link
Collaborator Author

zyyw commented Jan 16, 2024

@knqyf263 for awareness, made a few subtle changes after your approval on this PR:

stonezdj
stonezdj previously approved these changes Jan 17, 2024
and add /scan/{scan_request_id}/report 400 response code

Signed-off-by: Shengwen Yu <[email protected]>
Copy link
Contributor

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zyyw zyyw merged commit 7aad47c into goharbor:master Jan 17, 2024
1 check passed
@knqyf263
Copy link

knqyf263 commented Jan 18, 2024

After I started implementing this spec, another question came to my mind.

additional_attributes is defined in #/components/schemas/ScannerCapability is defined as below.

additional_attributes:
type: object
descriptions: The additional attributes for scanner capabilities. If the type is sbom, then it returns supported media types of the SBOM format.
example: |
{
"sbom_media_types": [
"application/spdx+json",
"application/vnd.cyclonedx+json"
]
}

Is there any reason another parameter name (parameters) is used here? (additional_attributes vs parameters)

parameters:
$ref: '#/components/schemas/SbomParameters'
description: The additional parameters for the scan request, for the SBOM type, harbor will carry with `sbom_media_types` to specify the expected formats for SBOM content.
example: |
{
"sbom_media_types": [
"application/spdx+json",
"application/vnd.cyclonedx+json"
]
}

It is not a big deal, but I feel like using the same name is more straightforward because their relationship is "supported capabilities" and "enabled capabilities". They refer to essentially the same thing. What do you think? @zyyw

@zyyw
Copy link
Collaborator Author

zyyw commented Jan 18, 2024

Hi @knqyf263 , regarding additional_attributes and parameters, they do essentially refer to the same thing:

{ 
 "sbom_media_types": [ 
   "application/spdx+json", 
   "application/vnd.cyclonedx+json" 
 ] 
} 

The main reason of using two different terms refer to the same value is that additional_attributes serves as a part of the response data of a request to /metadata, while parameters serves as a part of request parameter to ScanRequest of /scan and it's not a response data indicating some additional attributes of an entity.

@knqyf263
Copy link

Yes, it makes sense. parameters might sound weird for /metadata, but this is because the request uses the term parameters. What if finding a better name that can be used for both, like additional_properties? For example, produces_mime_types is used in request and response. If I understand correctly, your answer is based on the current naming, additional_attributes and parameters, and it's not opposed to using the same name, right?

@zyyw
Copy link
Collaborator Author

zyyw commented Jan 18, 2024

the additional_attributes is to wrap up all extra attributes of the response from a request to /metadata, while parameters indicating parameters of a request of ScanRequest to /scan. Currently, they are essentially referring the same value. But as the scanner capabilities extends in the future, additional_attributes and parameters might refer to different things.

@knqyf263
Copy link

Thanks for your explanation. OK, I'll implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants