Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct export tool (a compiler target) #377

Open
arekbulski opened this issue Mar 12, 2018 · 150 comments
Open

Construct export tool (a compiler target) #377

arekbulski opened this issue Mar 12, 2018 · 150 comments
Assignees

Comments

@arekbulski
Copy link
Member

arekbulski commented Mar 12, 2018

After investigating the issue a bit more closely (and I admit that I am far from implementing it...), I came to a conclusion that the Construct->Kaitai tool should be implemented on Construct side, but the Kaitai->Construct export tool should be implemented on your side. I will need some assisstance with it, namely with the compiler since I dont speak Scala at all. I can provide you with examples, what the compiler should spew out, but I probably wont be able to write Scala code for it.

I foresee one (admittedly huge problem): Construct API is not stable (at least at the moment). This seems to be mostly affecting classes that are brand new, but there were also some changes to classes that existed before I took over the project. If it would take Kaitai a year to ship changes, thats kind of a problem. I suppose the users could use the newest version from GitHub (build the compiler from sources), so that would somewhat mitigate if not solve the issue.

I attach the map of completion, to be edited later (updated manually):
38455241-3b627b64-3a6d-11e8-8fe3-19726c3ef877

Current translations and CI results (updated automatically):
https://github.com/kaitai-io/ci_targets/tree/master/compiled/construct
http://kaitai.io/ci/

@GreyCat
Copy link
Member

GreyCat commented Mar 12, 2018

I suspect that Construct target would be very different from what's usually generated, i.e. we'll need to match declarative constructions, not procedural statements. That's totally possible too, though.

#253 is progressing smoothly, so chance are it would be much easier to do all the tests after it would be completed. Actually, it can be used right now, we just need to add relevant code to generate tests, Python's translator probably should be fine as is?

@arekbulski
Copy link
Member Author

I will start working on both import/export tools then, and should deliver a first batch of translated examples within say a few days. Can I count on you to handle writing the compiler scala code?

To return the favor, I will get on top of some other work items that I assigned myself to and kind of had it stuck in the backlog, most importantly the Construct->Kaitai export tool which will be implemented entirely on Construct side. Actually its very similar to the compiler infrastructure, except the pyYAML module would be generating the output instead of a custom text-writer class.

@GreyCat
Copy link
Member

GreyCat commented Mar 13, 2018

Can I count on you to handle writing the compiler scala code?

Sure :)

@arekbulski
Copy link
Member Author

Would it be more convenient if structs were created kind of imperatively like this?

s = []
s.append('fieldname' / Int8ub)
return Struct(*s)

instead of proper syntax?

Struct(
'fieldname' / Int8ub,
)

@GreyCat
Copy link
Member

GreyCat commented Mar 14, 2018

Probably it won't matter, as this would be whole new Compiler, akin to GraphVizCompiler, which would not use existing templating mechanism (=does not need to fit into existing header-contents-footer workflow).

@arekbulski
Copy link
Member Author

Could the construct-compiler (lets call it the export tool?) have a weekly release cycle?

@GreyCat
Copy link
Member

GreyCat commented Mar 16, 2018

I don't quite understand what you call "release cycle". At least in the near foreseeable future, "releasing" would probably remain relatively time-consuming and thus not very common task — mostly because it means lots of PR, writing announces, posting announces on the news sites, etc.

On the other hand, if you just want current builds, they're available right now, both for Debian & Windows. Would that qualify for "wait for 5-7 minutes after the commit" release cycle?

@arekbulski
Copy link
Member Author

Yes that is exactly what I meant. So this solves the issue of Construct having unstable API.

@arekbulski
Copy link
Member Author

Progress report on the import tool:

    def test_exportksy():
        d = Struct(
            "num1" / BytesInteger(4),
            "num2" / Int32ub,
            "num2" / Float32b,
            "data1" / Bytes(4),
            "data2" / GreedyBytes,
            "array2d" / Array(5, Array(5, BytesInteger(1)))
        )
        d.export_ksy()
--------------------------------------- Captured stdout call ----------------------------------------
meta:
  id: unnamed_schema
seq:
- id: num1
  type: u4be
- id: num2
  type: u4be
- id: num2
  type: f4be
- id: data1
  size: 4
- id: data2
  size-eos: true
- id: array2d
  repeat: expr
  repeat-expr: 5
  type: type_0
types:
  type_0:
    seq:
    - id: x
      repeat: expr
      repeat-expr: 5
      type: u1be

@arekbulski
Copy link
Member Author

Switched from pyYAML to ruamelYAML, it fixed the order of keys.

@arekbulski
Copy link
Member Author

Would that be valid analog to Prefixed?

- id: lengthfield
type: u4le
- id: data
size: lengthfield
type: xxx

@GreyCat
Copy link
Member

GreyCat commented Mar 16, 2018

If that's going to be just a byte array, just drop type: xxx, that would be raw byte array.

@arekbulski
Copy link
Member Author

No, that would be Bytes analog, I am asking about Prefixed.
https://construct.readthedocs.io/en/latest/api/tunneling.html#construct.Prefixed

@GreyCat
Copy link
Member

GreyCat commented Mar 16, 2018

Um, then just use what you've proposed — it should work ;)

@arekbulski
Copy link
Member Author

Do I get it right, that fields can have doc tag but entire struct cannot?

@GreyCat
Copy link
Member

GreyCat commented Mar 16, 2018

Typespec (that's probably what you call "entire struct") can have doc + doc-ref.

@arekbulski
Copy link
Member Author

I was thinking of something on meta level but closest thing I found was title.
Typespec would I think be a nested struct, not outer-most struct.

@arekbulski
Copy link
Member Author

Uhm... what would be the analog of Pass?
https://construct.readthedocs.io/en/latest/api/streaming.html#construct.Pass

@GreyCat
Copy link
Member

GreyCat commented Mar 17, 2018

Probably there won't be direct equivalent. Default case for switch is _. There is no "default case" for enums, as (1) they work pretty different from Construct implementation, i.e. then don't convert integers <-> strings, but integers <-> constants, (2) currently their implementation is language-dependent.

@arekbulski
Copy link
Member Author

Would size: 0 be a valid attributespec?

@GreyCat
Copy link
Member

GreyCat commented Mar 17, 2018 via email

@arekbulski
Copy link
Member Author

@GreyCat
The docs say that AttributeSpec tags "must" come in specified order. Is that really a hard requirement or just a way of emphasizing the importance of a style guide?

@GreyCat
Copy link
Member

GreyCat commented Mar 17, 2018

"Style guide" is called style guide for a reason of enforcing certain style: where several different behaviors are technically possible, we suggest one to follow. If someone does not do what style guide suggests, it fails to comply with style guide, but the code could be still compilable. If someone does not do what language reference dictates, most likely it will result in compilation error.

So, the short answer: tags inside attribute spec can come in any order. Compiler actually has no way to even know of that order, it just gets it as unordered map.

@arekbulski
Copy link
Member Author

Alright, for the record this is what the style guide says:
"When specifying an attribute, one MUST use the following order of keys"

@GreyCat
Copy link
Member

GreyCat commented Mar 17, 2018

Exactly. That "MUST" is to be interpreted as "in order to make style-compliant ksy, you must do that".

@KOLANICH
Copy link

KOLANICH commented Mar 18, 2018

I guess the order should be user-controllable by passing 3 lists, one is for properties (in seq, instances, enums, params), one is for types, and one is for meta.
For example ["id", "-orig-id", "type", ...] if we wanna comply with the style guide.

@GreyCat
Copy link
Member

GreyCat commented Mar 18, 2018

I'm not sure that style compliance is worth pursuing, at least now. For example, PyYAML generates arrays-in-maps like that:

seq:
- id: foo
  type: bar

while style guide suggests:

seq:
  - id: foo
    type: bar

@arekbulski
Copy link
Member Author

Actually I was thinking af exactly the same thing, but after a beer, hugging the pillow seemed more important than writing a reply. 😴

_io _root _params it is.

@GreyCat
Copy link
Member

GreyCat commented Mar 31, 2018

JFYI, after last updates, all Construct tests fail with something like

Traceback (most recent call last):
  File "/home/travis/build/kaitai-io/kaitai_struct/tests/spec/construct/test_cast_to_imported.py", line 9, in test_cast_to_imported
    r = _schema.parse_file('src/fixed_struct.bin')
  File "/home/travis/.local/lib/python2.7/site-packages/construct/core.py", line 328, in parse_file
    return self.parse_stream(f, **contextkw)
  File "/home/travis/.local/lib/python2.7/site-packages/construct/core.py", line 319, in parse_stream
    return self._parsereport(stream, context, "(parsing)")
  File "/home/travis/.local/lib/python2.7/site-packages/construct/core.py", line 331, in _parsereport
    obj = self._parse(stream, context, path)
  File "/home/travis/.local/lib/python2.7/site-packages/construct/core.py", line 2081, in _parse
    obj = Container(_stream = FeaturedBytesIO(stream))
  File "/home/travis/.local/lib/python2.7/site-packages/construct/core.py", line 141, in FeaturedBytesIO
    stream.size = lambda: _size_stream(stream)
AttributeError: 'file' object has no attribute 'size'

@arekbulski
Copy link
Member Author

arekbulski commented Mar 31, 2018

Problem was fixed (I will have the fix ship to pypi within hours). You need to wrap the _io stream with the following functions. Its not possible to add new attributes to file streams (on Python 2):

def stream_seek(stream, offset, whence=0):
    try:
        return stream.seek(offset, whence)
    except Exception:
        raise StreamError("stream.seek() failed, offset %s, whence %s" % (offset, whence,))

def stream_tell(stream):
    try:
        return stream.tell()
    except Exception:
        raise StreamError("stream.tell() failed")

def stream_size(stream):
    fallback = stream.tell()
    end = stream.seek(0, 2)
    stream.seek(fallback)
    return end

def stream_iseof(stream):
    fallback = stream.tell()
    data = stream.read(1)
    stream.seek(fallback)
    return not data

The context entries were also renamed to _root _params _io.

@GreyCat
Copy link
Member

GreyCat commented Mar 31, 2018

Ok, understood, thanks! Will apply the fix now... We really need to get construct3 and python3 fixed on CI...

@arekbulski
Copy link
Member Author

Just shipped the aforementioned fixes to pypi.

@GreyCat
Copy link
Member

GreyCat commented Mar 31, 2018

Applied the fix, we seem to be at 44% now.

@arekbulski
Copy link
Member Author

Could you look into these 2?

  • NavRoot: _root is a context entry (not a global variable) so its this._root, just like Computed lambda.
  • NavParentOverride: there should be 2 different expressions for parent (up level) because the data field is double nested when in nav_parent_override__mediator which would make it this._._.child_size

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2018

NavRoot

Seems to be fixed now.

NavParentOverride

Um, no, that's the whole point of "parent override", that you can keep having same _parent expression, but in the generated instancing we can pass not a normal self as parent, but something else in some cases. I.e. for Python, normal invocation without override would be:

self.child_1 = self._root.Child(self._io, self, self._root)

and with parent override:

self.child_2 = self._root.Child(self._io, self._parent, self._root)

Note that 2nd argument (and that is parent we pass into the user type) is different than normal self.

If that's hard to implement, I'd suggest to just skip this whole "parent override" idea for Construct target, it's not critical by any means.

@arekbulski
Copy link
Member Author

Uh, could you simply add a "_"/Computed(...) field that would override the context entry?

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2018

Ah, ok, probably I've understood what you mean here... Let's try it...

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2018

It would probably even work in this particular case, but the main problem is actually that this Computed(...) needs to be evaluated in context of parent, on in context of child where we'll be adding that field to.

@arekbulski
Copy link
Member Author

I will get you Process XOR/rotate and few other formats translated in short time. From what I understand the CI page, the Enums and Switches are mostly red because KST doesnt generate the test cases?

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2018

the Enums and Switches are mostly red because KST doesnt generate the test cases?

Right. Actually, here's the new coverage chart:

coverage_construct2

@arekbulski
Copy link
Member Author

arekbulski commented Apr 7, 2018

How do you feel about the two of use working on Construct today? I will try not to be so sloppy.

@GreyCat
Copy link
Member

GreyCat commented Apr 7, 2018

Unfortunately, I only appear here from time to time today and can do only some small pieces of work.

Next big thing on my schedule is probably (1) porting the rest of switch specs to KST, (2) implementing nested type accessing for enum literals, and then (3) porting enums to KST. That would probably give us ~80-90% coverage for KST tests, and that should make adoption of any new emerging targets (Construct, Go, Rust, may be C) much easier.

Feel free to suggest whatever else I can fix in Construct generation. There are plenty of "red" things on that chart above that need fixing. Probably stuff like expr_*, float_to_i, process_* should be relatively easy to fix.

@arekbulski
Copy link
Member Author

Okie dokie, I was just asking for someone to play with me with my toys. 😄

I pushed a new commit with translated formats: kaitai-io/kaitai_struct_tests@a49e697.
On Construct side: I need to add ProcessRotateLeft class, in Pointer add stream parameter (so you select a stream), and in Struct have _index copied over so you can refer to it directly. I will let you know immediately when that is implemented and shipped to pypi.

@arekbulski
Copy link
Member Author

arekbulski commented Apr 7, 2018

I kinda forsee a problem with io tag support (selecting stream). Can this feature be only used with pointer instances (those with pos) or with any tag, including seq?
The ParentOverride is also a problem that I am unable to solve at this time.

@GreyCat
Copy link
Member

GreyCat commented Apr 8, 2018

I kinda forsee a problem with io tag support (selecting stream). Can this feature be only used with pointer instances (those with pos) or with any tag, including seq?

Only with instances with pos. Sequence attributes are indeed dubbed "sequence" because they define data in sequence in the same stream.

This one is relatively important, as stream selection is used by a lot of production format specs. 26 out of 110 ksy files in formats repo use it.

The ParentOverride is also a problem that I am unable to solve at this time.

Don't bother, really. This functionality is largerly superseeded by passing arbitrary parameters, so I don't think that anyone uses it in reality.

@arekbulski
Copy link
Member Author

Okie dokie, then we are at actionable consensus. I will add streamfunc parameter to Pointer class.

@arekbulski
Copy link
Member Author

Construct updates were shipped to pypi.

@GreyCat
Copy link
Member

GreyCat commented Aug 12, 2018

@arekbulski, @KOLANICH — is there any chance that you'd be still interested in this issue?

I've finally completed enum support in KST, this can be leveraged to add another huge chunk to construct target support.

@arekbulski
Copy link
Member Author

Construct is no longer actively developed. I am planning to resume workin on it for a few days, finish what was left unfinished and then freeze it. So no atm and yes possibly in the future.

@arekbulski
Copy link
Member Author

arekbulski commented Feb 3, 2021

Also, note that, unfortunately, @arekbulski stopped working on Construct: #377 (comment), so most likely Construct target will be phased out in some time :(

@GreyCat

You make it sound as if I removed Construct from the ecosystem. I said that I stopped working on it ie. I stopped adding new features. All existing features (and there is a lot of them) are still fully functional. Please do not drop support for Construct. Few days ago I resumed work on it. Currently there are 0 outstanding tickets. And the KSY export tool is still there too.

Looking at the CI page: I think you can remove "construct2" because I dropped support for 2.7 long time ago. Currently Construct runs only on 3.6+. Unless I misunderstood what the "2" stands for.

@Mimickal
Copy link

I may have found a bug in the Construct exporter. I have a type that depends on an enum, but the type is output in the Python before the enum. Since Python reads top-to-bottom, it raises a NameError about an undefined symbol.

Here is a minimal example:

# The input ksy
meta:
  id: example
  endian: le
seq:
  - id: video
    type: video_settings
types:
  video_settings:
    seq:
      - id: texture_quality
        type: u1
        enum: quality_options
enums:
  quality_options:
    0: low
    1: medium
    2: high
# The output Python
from construct import *
from construct.lib import *

example__video_settings = Struct(
	'texture_quality' / example__quality_options(Int8ub),
)

# This is required above!
def example__quality_options(subcon):
	return Enum(subcon,
		low=0,
		medium=1,
		high=2,
	)

example = Struct(
	'video' / LazyBound(lambda: example__video_settings),
)

_schema = example

Digging into the exporter code, it looks like it just always outputs types before enums https://github.com/kaitai-io/kaitai_struct_compiler/blob/0acfa60452a10b770a2c2e746f3de057f8ab76d9/shared/src/main/scala/io/kaitai/struct/ConstructClassCompiler.scala?ts=4#L36
Would reordering these two calls break anything else?

@Mimickal
Copy link

Mimickal commented Jan 30, 2022

How do you use parameterized types with Construct? Are they currently implemented?

I have this ksy that represents a range with arbitrary numeric type.

meta:
  id: range
  endian: le
params:
  - id: type_key
    type: str
seq:
  - id: min
    type: &type
      switch-on: type_key
      cases:
        '"f4"': f4
  - id: max
    type: *type

I use it like this:

meta:
  id: wind
  endian: le
  imports:
    - common/range
seq:
  - id: velocity
    type: range("f4")

I expect Wind to give Range that "f4" value, and indeed in Python it does:

# Stripped for brevity
# range.py
class Range(KaitaiStruct):
    def __init__(self, type_key, _io, _parent=None, _root=None):
        self.type_key = type_key

	def _read(self):
        _on = self.type_key
        if _on == u"f4":
			self.min = self._io.read_f4le()

# wind.py
from structs import range
class Wind(KaitaiStruct):
def _read(self):
        self.velocity = range.Range(u"f4", self._io)

In Construct, Range does appear to switch on type_key, but Wind does not seem to give Range the expected "f4" type_key. (Note, the import statement comes from my PR kaitai-io/kaitai_struct_compiler#242)

# range.py
range = Struct(
	'min' / Switch(this.type_key, {u"f4": Float32l, }),
	'max' / Switch(this.type_key, {u"f4": Float32l, }),
)

# wind.py
from structs.range import * # Added in my PR

wind = Struct(
	'velocity' / LazyBound(lambda: range), # Where is "f4"?
)

When attempting to load Wind data, I get KeyError: 'type_key'. I'm not quite sure where you would even add the expected type_key in Wind. According to the Construct docs, LazyBound does not take a parameter.

generalmimon added a commit to kaitai-io/kaitai_struct_docker_images that referenced this issue Dec 29, 2024
The last version of Construct supporting Python 2.7 (and Python 3.5) was
[2.10.54](https://pypi.org/project/construct/2.10.54/), all versions
starting with [2.10.56](https://pypi.org/project/construct/2.10.56/)
released on Jan 28, 2020 only support Python 3.6+. So our
`construct/python{2.7,3.4}` images were apparently always pulling
Construct 2.10.54 from PyPI, because that was the latest version
supporting these old Python versions. It makes little sense to test if
our new code works with a 5-year old version of Construct on an EoL
version of Python, let's move on.

The number of Construct users running Python 2 looks minimal, see
https://pypistats.org/packages/construct. There is roughly ~70k daily
downloads of the `construct` PyPI package on Python 3 and only 120
downloads on Python 2.

Also, @arekbulski confirmed in 2021 that Construct only supports Python
3.6+ and therefore it makes no sense to test the Construct target on
Python 2:
kaitai-io/kaitai_struct#377 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants