Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify "same object state" of version block (E066) #571

Closed
srerickson opened this issue Dec 30, 2021 · 5 comments · Fixed by #586
Closed

clarify "same object state" of version block (E066) #571

srerickson opened this issue Dec 30, 2021 · 5 comments · Fixed by #586
Assignees
Labels
Editorial Editorial issues (no changes to intent) Needs Discussion OCFL Object
Milestone

Comments

@srerickson
Copy link
Contributor

srerickson commented Dec 30, 2021

I have a question about this line from the spec:

E066: Each version block in each prior inventory file MUST represent the same object state as the corresponding version block in the current inventory file

As I understand it, the digest algorithm can change from one inventory to the next, which means the digests in the version blocks can change. If that's true, then isn't the sense of "sameness" in this statement somewhat ambiguous? I think it may help to explain what makes two version blocks the same, even when the digests may differ. Something like this: The same byte stream used for a given digest in the version block of the prior inventory must be used to generate the corresponding digest in the version block on the new inventory.

@pwinckles
Copy link

Let's say your versions block for v1 of an object looks like the following:

  "versions": {
    "v1": {
      "created": "2018-10-02T12:00:00Z",
      "message": "version one",
      "state": {
        "7545b8...f67": [ "file.txt" ],
        "12b348...9ac": [ "file2.txt" ]
      },
      "user": {
        "address": "[email protected]",
        "name": "Alice"
      }
    }
  }

I believe that section of spec is to ensure that later versions don't do something like the following:

  "versions": {
    "v1": {
      "created": "2018-10-02T12:00:00Z",
      "message": "version one",
      "state": {
        "7545b8...f67": [ "file2.txt" ],
        "12b348...9ac": [ "file.txt" ]
      },
      "user": {
        "address": "[email protected]",
        "name": "Alice"
      }
    },
    "v2": {
      "created": "2018-10-02T12:00:00Z",
      "message": "version two",
      "state": {
        "7545b8...f67": [ "file2.txt" ],
        "12b348...9ac": [ "file.txt" ],
        "3b456a...111": [ "file3.txt" ]
      },
      "user": {
        "address": "[email protected]",
        "name": "Alice"
      }
    }
  }

In this case, both the v1 and v2 inventories would validate in isolation. However, the v2 inventory is invalid by E066 because it changes the state of v1.

I think the text that you suggested is too focused on accounting for the case where the inventory digest algorithm changes, which is not necessary for there to be a violation of E066.

@srerickson
Copy link
Contributor Author

Thanks for the response @pwinckles. Do you think the language of E066 could be improved by stating explicitly that a change in the digest algorithm is not a violation? I think it might because of the ambiguity I described.

The concern I have is that it's easy for validator authors to misinterpret this part of the spec as saying that the json for version states should be equivalent across inventories -- or to otherwise misinterpret "same object state." (that's based on personal experience 😀).

@pwinckles
Copy link

Yes, I agree that the intent of "same object state" could be more clear.

@zimeon zimeon added Editorial Editorial issues (no changes to intent) Needs Discussion labels Jan 21, 2022
@zimeon zimeon added this to the 1.1 milestone Jan 21, 2022
@pwinckles
Copy link

When/if this is addressed, perhaps the question of unicode normalization could also be addressed? As noted in point three in #559:

The spec states "Each version block in each prior inventory file MUST represent the same object state as the corresponding version block in the current inventory file." In case of logical paths, is it up to the implementation to decide if this is a byte-for-byte comparison or a normalized comparison?

@zimeon
Copy link
Contributor

zimeon commented Mar 18, 2022

I think this situation might be clearer if we changed the spec to say:

Each version block in each prior inventory file MUST represent the same object logical state as the corresponding version block in the current inventory file.

because we define "logical state" as logical paths tied to bitstreams, not dependent upon the digest algorithm, whereas "object state" is not formally defined.

I agree that changes in digest algorithm between inventories are fine, and do not create a problem meeting this condition. For example, the approach my validator code uses to check for E066 is to create maps from logical paths, in a particular version state, to content files (thus taking the digests entirely out of the check).

(I do also have code that provides extra debugging info using digest values, in the case that the digest algorithms do match between versions, but that isn't necessary to detect an error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Editorial issues (no changes to intent) Needs Discussion OCFL Object
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants