-
Notifications
You must be signed in to change notification settings - Fork 37
/
CHANGES
9075 lines (5676 loc) · 334 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1.13.0-dev.2 | 2024-12-19 17:08:09 +0100
* GH-1949: Fix codegen for string literals containing null bytes. (Benjamin Bannier, Corelight)
Our strings are UTF8 which can contain literal null bytes. We were
previously generating incorrect C++ for such literals. This was due to
us constructing C++ value via constructors taking `const char*` which is
expected to be null-terminated, i.e., everything after the terminator
was ignored even though it was emitted.
With this patch we switch to C++ literals for generating literal and
runtime strings.
Closes #1949.
1.13.0-dev.0 | 2024-12-13 09:26:36 +0100
* GH-1901: Add consistent validation for attributes. (Evan Typanski, Corelight)
This updates attribute validation to see if attributes are in the
right places. The goal here is to have one place (ie a big set) to
answer the question "Can I use X attribute on Y node?"
There are a lot more moving parts with attribute validation, but those
generally have to do with behavior. A lot of that requires extra context
in the validator, which is exactly what the validator is meant to do.
Much of that is pretty ad-hoc, so it could get cleaned up as well.
1.12.0-dev.240 | 2024-12-12 13:09:05 +0100
* Make sure autogen-docs pre-commit hook can always run in CI. (Benjamin Bannier, Corelight)
1.12.0-dev.238 | 2024-12-10 14:53:29 +0100
* Update tutorial to use spicy-driver's batch mode. (Robin Sommer, Corelight)
* Allow defining parser alias names when running spicy-driver. (Robin Sommer, Corelight)
``--parser-alias <alias>=<name>`` now makes the parser of name
`<name>` accessible under an additional name `<alias>`. The option can
be used multiple times.
This is particularly useful when using input batches: by adding an
alias of the form `PORT/PROTOCOL` (e.g., `--parser-alias
80/tcp=MyHTTP::Message`), this allows to define a parser to use on for
connections on that port (because the batch processing interprets any
ports in the trace as strings and tries to lookup a parser of the
corresponding name). Adding `%orig` or `%resp` to the alias name
limits it to the corresponding direction (e.g., ``--parser-alias
80/tcp%orig=MyHTTP::Request``).
See the `spicy-driver` documentation for more details on how to use
this with batches.
* Let `spicy-{dump,driver}` print out parser alias names with ``-ll``. (Robin Sommer, Corelight)
``-list-parsers`` prints the direct names of all known parsers.
However, internally, we sometimes register additional alias names that
can also be used to lookup a parser (e.g., if there's a `%mime-type`
or `%port` property). If one specifies ``-list-parsers`` twice now,
those aliases are printed as well.
* GH-1829: Catch integer shifts exceeding the width of the operand. (Robin Sommer, Corelight)
For constants, we now reject such shifts at compile time. At runtime,
we catch them by having the forked version of `SafeInt` throw an
overflow exception.
* Change `SafeInt` submodule to a fork including a fix. (Robin Sommer, Corelight)
We now use a fork of `SafeInt` that includes
https://github.com/dcleblanc/SafeInt/pull/64 (as well as general bump
to upstream `master`).
* Add `max-lines` option to Sphinx' `spicy-output` directive. (Robin Sommer, Corelight)
This limits the lines of output shown to the given number, adding a
`...` marker if truncated.
* GH-1928: Deprecate `&anchor` with regular expression constructors. (Robin Sommer, Corelight)
* Update auto-generated architecture diagram. (Robin Sommer, Corelight)
Looks like some dependency upgrade changed the content slightly.
1.12.0-dev.226 | 2024-12-09 17:28:27 +0100
* Fix `HILTI_CXX_FLAGS` for when multiple flags are passed. (Benjamin Bannier, Corelight)
* Add helper function to perform shell-like string splitting. (Benjamin Bannier, Corelight)
1.12.0-dev.223 | 2024-12-09 11:54:03 +0100
* Fix ruff 'ISC' lints. (Benjamin Bannier, Corelight)
* Fix ruff 'I' lints. (Benjamin Bannier, Corelight)
* Fix ruff 'C4' lints. (Benjamin Bannier, Corelight)
* Migrate Python linting to ruff. (Benjamin Bannier, Corelight)
* Bump pre-commit hooks. (Benjamin Bannier, Corelight)
* Upgrade Python to 3.9 with pyupgrade. (Benjamin Bannier, Corelight)
1.12.0-dev.216 | 2024-12-06 10:34:31 +0100
* Introduce `deprecated` helper function in Spicy validator. (Benjamin Bannier, Corelight)
* Add test for using both `[]` and `&count`. (Benjamin Bannier, Corelight)
* GH-1938: Deprecate `&count` attribute. (Benjamin Bannier, Corelight)
The preferred way to indicate how many elements should be parsed for a
vector has been vector syntax over `&count` for a long time, e.g.,
```
type X = unit {
: uint8[] &count=42; # AVOID.
: uint8[42]; # PREFER.
};
```
With this patch `&count` is now deprecated and we emit a warning. We
will remove support for `&count` in a future release.
Closes #1938.
1.12.0-dev.212 | 2024-12-05 10:31:19 +0100
* Fix doc code snippet that won't compile. (Evan Typanski, Corelight)
1.12.0-dev.210 | 2024-12-04 12:08:31 +0100
* Fix issue with type inference for `result` constructor. (Robin Sommer, Corelight)
`global x = result("foo");` would end up having type `string` instead
of `result<string>`.
* GH-1856: Disallow dereferencing a `result<void>` value. (Robin Sommer, Corelight)
Closes #1856.
* GH-1856: Teach `&requires` to accept condition-tests expressions. (Robin Sommer, Corelight)
This now allows creating custom error messages when a `&require`
condition fails. Example:
type Foo = unit {
x: uint8 &requires=($$ == 1 : error"Deep trouble!'");
# or, shorter:
y: uint8 &requires=($$ == 1 : "Deep trouble!'");
};
Internally, `&requires` now always wants a condition-test expression.
That remains transparent to the user, however, because the resolver
knows how to coerce a pure boolean `&requires` expression into a
condition test, crafting an error message that happens to look like
what we used to produce in the past.
Closes #1856.
* Teach `assert` to accept an `result<void>` value. (Robin Sommer, Corelight)
This allows to use the new condition-test operator with assert:
assert 3 == 4 : error"not equal"
Because of the RHS of a condition test coercing a `string` to `error`
automatically, this now also covers the existing syntax for `assert`.
So the above is now (externally and internally) the same as:
assert 3 == 4 : "not equal"
* Add new "condition-test" operator. (Robin Sommer, Corelight)
This adds a new expression operator to both HILTI and Spicy:
`COND : ERROR`
where `COND` is a boolean expression, and `ERROR` is an expression of
type `error`. When evaluated, the new operator yields a value of type
`result<void>`, which will be true if the `COND` is true, and set to
`ERROR` if `COND` is false. In other words, this is a short-cut to
both test a condition and provide an error message in case it fails.
1.12.0-dev.204 | 2024-11-28 13:57:57 +0100
* Spell all `Attribute::Kind` values in Pascal case. (Benjamin Bannier, Corelight)
We are not 100% consistent, but we seem to prefer PascalCase over
SNAKE_UPPERCASE for enum values. Rename the recently introduced
`Attribute::Kind` values to that style.
An intended side effect of this is that we now avoid collisions with a
possibly defined `DEBUG` macro (e.g., Zeek defines it).
1.12.0-dev.202 | 2024-11-26 18:31:48 +0100
* GH-1901: Prefer enum over string for attributes. (Evan Typanski, Corelight)
1.12.0-dev.200 | 2024-11-25 16:14:56 +0100
* GH-1920: Fix some inconsistencies in random access docs. (Evan Typanski, Corelight)
* GH-1914: Make `$$` documentation more precise. (Evan Typanski, Corelight)
The documentation would refer to `$$` in an attribute and hook with
similar language (the "parsed" value). But, in an attribute, it refers
to the value before any conversions. In a hook, it refers to the value
after any conversions.
Now, "parsed" is used to refer to the pre-conversion value, and "final"
is used to refer to the post-conversion value. Those terms were chosen
because they make sense when no conversions are applied (or when `$$` is
used *in* a conversion). But, "final" can still be a little confusing,
since you can write to `$$`.
1.12.0-dev.196 | 2024-11-25 08:30:19 +0100
* GH-1918: Fix potential segfault with stream iterators. (Robin Sommer, Corelight)
When trimming off the beginning of a stream, an existing iterator
could end up dereferencing its internal chunk pointer even if the
chunk now no longer existed. The issue was inside the
increment/decrement operations, which didn't check if the *current*
iterator offset was still valid (because only then the current chunk
is guaranteed to be valid too).
* GH-1918: Add regression tests triggering #1918. (Robin Sommer, Corelight)
1.12.0-dev.193 | 2024-11-21 14:48:06 +0100
* GH-1919: Validate that sets are sortable. (Evan Typanski, Corelight)
1.12.0-dev.191 | 2024-11-14 11:18:29 +0100
* Locally disable some warnings for >=gcc-13. (Benjamin Bannier, Corelight)
fedora-40 and fedora-41 come with GCC versions which trigger new
warnings.
- Some uses of `visitor::range` trigger dangling-reference false
positives, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107532.
- The dtor of `Location` triggers maybe-uninitialized false positives in
`hilti::rt::filesystem::path` for `_file`.
Disable these warnings locally if we are building with >=gcc-13 for now.
* Bump Fedora versions in CI. (Benjamin Bannier, Corelight)
1.12.0-dev.188 | 2024-11-07 15:19:35 +0100
* Make units encompass attributes in location. (Evan Typanski, Corelight)
* GH-1898: Move type alias attribute validation to parser. (Evan Typanski, Corelight)
Closes #1898
This was originally in a validator, but there's some sense that this is
a parsing issue since attributes should not be allowed at all on most
constructs of that form.
Note that a lot of the cleanup here was meant to span removing the
`opt_attributes` field in `type_decl` altogether, but the fallout from
that is pretty large. It ended up being less cleanup and more
rearchitecting. This way still avoids the hacky test for type aliases in
the validator while not changing the AST structure.
1.12.0-dev.182 | 2024-11-06 14:19:55 +0100
* GH-1859: Improve error message when a unit parameter is used as a
field. (Robin Sommer, Corelight)
1.12.0-dev.180 | 2024-11-06 08:41:55 +0100
* GH-1913: Avoid redundant computation during literal parsing. (Robin Sommer, Corelight)
* GH-1910: Optimize parsing of literal bytes. (Robin Sommer, Corelight)
We now create the bytes instance representing the literal as a global
singleton to avoid instantiating it over and over again.
Closes #1910.
* Introduce `UnsafeConstIterator` for bytes instances. (Robin Sommer, Corelight)
An unsafe iterator offers fast but unchecked access to the data. We
also rename `bytes::Iterator` to `bytes::SafeConstIterator` so that
for bytes we now follow the same two-tiered iterator structure as for
streams. We then switch some library code over to now use unsafe
iterators, gaining a noticeable speed-up in some cases.
* Add infrastructure to create and cache global constants. (Robin Sommer, Corelight)
So far we had replicated low-level code a few times that cached
constants through a global declaration for reuse. This now factors out
that logic into a central method inside the Spicy code generator.
This isn't full interning, which I'm not sure we want/need, but covers
the performance use case that has now come up a few times.
* Replace our poor hash function with `std::hash()`. (Robin Sommer, Corelight)
1.12.0-dev.173 | 2024-10-28 10:22:29 +0100
* GH-1908: Fix performance regression when parsing bytes. (Robin Sommer, Corelight)
Turns out our improved error messages were adding additional overhead
because we were now constructing them through `fmt()` each time we
needed more data, independent of whether there was actually going to
be an error reported.
This adds a second version of `waitForInput()` that doesn't receive an
already prepared error message, but just returns false on error so
that the caller can throw itself.
* GH-1857: Support `&requires`for individual vector items. (Robin Sommer, Corelight)
1.12.0-dev.169 | 2024-10-25 10:09:50 +0200
* GH-1895: Do no longer escape backslashes when printing strings or bytes. (Robin Sommer, Corelight)
Escaping backslashes in `print` output seems both unnecessary and
confusing.
* Rename `expandEscapes()` to `expandUTF8Escapes`. (Robin Sommer, Corelight)
For better readability to avoid confusion about the semantics of the
function.
* Introduce style flags for functions rendering values into strings. (Robin Sommer, Corelight)
This is just cleanup for better readability. I also fixed a couple of
cases where we now escape control characters where previously we
wouldn't, but I believe should have. Otherwise no functional change.
* GH-1893: Encompass child node's location in parent. (Evan Typanski, Corelight)
When a child is added with `addChild`, the parent's location should
(generally) span over that child as well. This primarily helps for cases
where a node doesn't have much of a location until it gets children
added - like `AttributeSet`. The locations for those should encompass
all of the attributes within the set. That logic applies for any node
with a child: if it's the child, then its location should reflect that.
1.12.0-dev.163 | 2024-10-24 09:59:35 +0200
* GH-1803: Fix namespacing of `hilti` IDs in Spicy-side diagnostic output. (Robin Sommer, Corelight)
We now show them with a `spicy` prefix, which makes more sense for
users.
* Add printer plugin hook to customize ID printing. (Robin Sommer, Corelight)
Because IDs are no longer AST nodes, we did not have a way for a
compiler plugin's printing code to modify how it would like them to be
printed. This adds a corresponding hook. It's not used anywhere yet,
but will be soon.
In addition, we add a notion of "user-visibility" to the printer API
so that printing code knows whether the resulting output is something
that will be shown to the user (e.g., in diagnostics), or remains
internal (e..g, raw code output, debugging output). Likewise not used
yet, but will be soon.
We also clean up the printer API a little bit.
* Add scope guard utility class. (Robin Sommer, Corelight)
1.12.0-dev.159 | 2024-10-23 17:33:28 +0200
* Bump pre-commit hooks (Benjamin Bannier, Corelight)
* Remove stray removal of directory in `/tmp` in test. (Benjamin Bannier, Corelight)
1.12.0-dev.156 | 2024-10-21 15:50:46 +0200
* GH-1063: Document arguments to `new` operator. (Robin Sommer, Corelight)
* GH-1858: Fix the literals parsers not following coercions. (Robin Sommer, Corelight)
1.12.0-dev.152 | 2024-10-18 15:31:40 +0200
* GH-1891: Fix GCC 11 compilation failure. (Evan Typanski, Corelight)
1.12.0-dev.150 | 2024-10-17 10:16:57 +0200
* GH-1792: Prioritize error message reporting unknown field. (Robin Sommer, Corelight)
This suppresses some non-interesting follow-up errors.
* GH-1790: Provide proper error message when trying access an unknown unit field. (Robin Sommer, Corelight)
Returning `type::Unknown` instead of `type::Auto` will let the
resolver process stop, allowing an already existing error message to
kick in later. We were doing it this way already for structs, but not
for units.
This also includes a fix for a bug with finding bitfield ranges by ID,
which was triggered by the change.
* GH-1791: Fix usage of `&convert` with unit's requiring parameters. (Robin Sommer, Corelight)
The generated code needs to create a temporary instance of the type,
but doesn't have any arguments to provide to it. But that's ok, and we
now let the validation pass and just instantiate a default-constructed
instance.
However, this change now requires an additional validator check on the
Spicy side to ensure fields giving arguments to types do so correctly
Before we happened to check that implicitly on the HILTI-side through
code that now would let it pass if no arguments were given.
* Factor out logic to validate arguments given to a type. (Robin Sommer, Corelight)
This will allow Spicy's validator to use it as well.
We also add two options to (1) accept usages where no argument is given at
all even when the type would normally require it,;and (2) skip any
actual type checking, and just confirm argument count. In a subsequent
change, we'll use (1) to fall-back to a type's default constructor,
and (2) to check type usage inside unit fields where we haven't
fully coerced arguments yet.
1.12.0-dev.144 | 2024-10-16 16:10:40 +0200
* GH-1861: Disallow ignored attributes on type aliases. (Evan Typanski, Corelight)
This may make previously fine code invalid. Since that code would likely
be hiding a bug, that seems okay to me.
Closes #1861
1.12.0-dev.140 | 2024-10-15 16:21:58 +0200
* GH-1871: Fix `&max-size` on unit containing a `switch`. (Robin Sommer, Corelight)
We would advance our input too early, letting the subsequent
`&max-size` check fail.
1.12.0-dev.138 | 2024-10-14 16:16:08 +0200
* GH-1868: Associate source code locations with current fiber instead of current thread. (Robin Sommer, Corelight)
This fixes potential location mix-ups when switching between fibers.
Note that we still need a context-wide fallback location as well
because we're not always running inside a fiber.
I ran a performance comparison before/after and couldn't measure a
difference. Looks like using TLS storage was a case of premature
optimization.
1.12.0-dev.136 | 2024-10-14 14:20:16 +0200
* Disable cpplint iwyu check. (Benjamin Bannier, Corelight)
* Replace a few C-style casts with `static_cast`. (Benjamin Bannier, Corelight)
1.12.0-dev.133 | 2024-10-08 21:50:53 +0200
* Document `continue` statements. (Evan Typanski, Corelight)
1.12.0-dev.131 | 2024-10-08 17:09:30 +0200
* Fix "it's" used as a possessive. (Evan Typanski, Corelight)
1.12.0-dev.129 | 2024-10-07 13:02:38 +0200
* Stop using deprecated pre-commit stage names. (Benjamin Bannier, Corelight)
The names we used have been deprecated since some time, see
https://github.com/pre-commit/pre-commit/issues/2732.
1.12.0-dev.127 | 2024-10-05 13:21:27 +0200
* GH-1867: Fix infinite loops with recursive types. (Evan Typanski, Corelight)
Closes #1867
There are two different cases where infinite loops happen with recursive
types. First, a type may reference itself (`type Data = Data`). Second,
a type may reference itself inside some other type (`type Data =
vector<Data>`).
The first is fixed with a recursion limit. Since the type simply cannot
resolve, it doesn't get anywhere near codegen. You could detect cycles,
but that introduces some extra overhead and complexity that shouldn't
be needed in a "simple" function.
The second is fixed with an ad-hoc "occurs" check in type unification.
That just detects cycles and aborts if one is present. This could be
placed at some other place in the "resolve until convergence" loop, but
it seems best put closest to the source of the issue.
1.12.0-dev.124 | 2024-09-30 17:43:41 +0200
* GH-1875: Fix potential nullptr dereference when comparing streams. (Robin Sommer, Corelight)
Because we are operating on unsafe iterators, need to catch when one
goes out of bounds.
1.12.0-dev.122 | 2024-09-30 14:18:09 +0200
* GH-1874: Add new library function `spicy::bytes_to_mac`. (Robin Sommer, Corelight)
```
## Returns a bytes value rendered as a MAC address string (i.e., colon-separated hex bytes).
public function bytes_to_mac(value: bytes) : string;
```
* Optimize `spicy::bytes_to_hexstring` and `spicy::bytes_to_mac`. (Robin Sommer, Corelight)
1.12.0-dev.119 | 2024-09-30 13:37:10 +0200
* GH-1846: Fix bug with captures groups. (Robin Sommer, Corelight)
When extracting the data matching capture groups we'd take it from the
beginning of the stream, not the beginning of the current view, even
though the latter is what we are matching against.
* Add missing trim after matching a regular expression. (Robin Sommer, Corelight)
1.12.0-dev.116 | 2024-09-30 13:35:49 +0200
* GH-1842: Fix when input redirection becomes visible. (Robin Sommer, Corelight)
With `&parse-at/from` we were updating the internal state on our
current position immediately, meaning they were visible already when
evaluating other attributes on the same field afterwards, which is
unexpected.
1.12.0-dev.114 | 2024-09-30 10:24:42 +0200
* GH-1844: Fix nested look-ahead parsing. (Robin Sommer, Corelight)
When parsing nested vectors all using look-ahead, we need to return
control back to upper level when an inner look-ahead isn't found.
This may change the error message for "normal" look-ahead parsing (see
test baseline), but the new one seems fine and potentially even
better.
* Apply compiler suggestion. (Robin Sommer, Corelight)
1.12.0-dev.111 | 2024-09-26 20:00:42 +0200
* Remove `ast-stats` output from tests baselines. (Robin Sommer, Corelight)
These are too noisy as they update with every AST change. Originally I
included `ast-stats` into some test baselines to ensure the new AST
infrastructure isn't doing weird stuff, but seems that's working out
ok.
1.12.0-dev.109 | 2024-09-26 17:29:17 +0200
* Fix parsing ambiguity with properties. (Robin Sommer, Corelight)
In the past, we had special-cased properties in our Flex/Bison parser
so that when parsing an expression, they wouldn't be recognized.
However, that now led an error field hook of the form `x: bytes
&size=42 %error` to be parsed as `&size=(42 % error)`. We now switch
to white-listing all known properties, just as we already do for
attributes. That way conflicts should be extremely rare.
* Redo error handling docs (Robin Sommer, Corelight)
The old text was very outdated. This extends the content, documents
the new per-field `%error` handler, and moves it all into the
"parsing" section to have it closer to the error recovery content.
* GH-1824: Add support for field-local `%error` handlers. (Robin Sommer, Corelight)
We now support attaching an `%error` handler to an individual field:
type Test = unit {
a: b"A";
b: b"B" %error { print "field B %error", self; }
c: b"C";
};
With input `AxC`, that handler will trigger, whereas with `ABx` it
won't. If the unit had a unit-wide `%error` handler as well, that one
would trigger in both cases (i.e., for `b`, in addition to its field
local handler).
The handler can also be provided separately from the field:
on b %error { ... }
In that separate version, one can receive the error message as well by
declaring a corresponding string parameter:
on b(msg: string) %error { ... }
This works externally, from outside the unit, as well:
on Test::b(msg: string) %error { ... }
This is eebased on top of `topic/robin/optimize-type-parsing` so that
we get the peephole optimizer.
* Refactor field hook attributes. (Robin Sommer, Corelight)
Currently we only have two types of field hooks: standard hooks and
`foreach` hooks. To prepare for more types, this refactors the code
to represent the type with an `enum` instead of a `foreach` boolean. It
also moves validation of permitted attributes from the parser to
the validator.
1.12.0-dev.104 | 2024-09-26 14:14:36 +0200
* Fine-tune `bytes` literal parsing further. (Robin Sommer, Corelight)
This now optimizes the generated code for parsing a `bytes` literal
based on what we're going to use it for, or not (require literal vs.
look-ahead vs skip).
The standard case of a parsing an expected literal now looks like
this:
```
# Begin parsing production: Ctor: b1 -> b"abc" (const bytes)
(*self).b1 = spicy_rt::expectBytesLiteral(__data, __cur, b"abc", "../tests/spicy/types/bytes/parse-length.spicy:20:10-20:15", Null);
__cur = __cur.advance(3);
if ( __trim )
(*__data).trim(begin(__cur));
# End parsing production: Ctor: b1 -> b"abc" (const bytes)
```
* Add peephole optimization simplifying rethrowing catch blocks. (Robin Sommer, Corelight)
The optimization turns this:
```
function void foo() {
try {
...
} catch {
throw;
}
}
```
into
```
function void foo() {
{
...
}
}
```
It would be even nicer if we didn't need the braces around the
remaining block, but it's generally not safe to remove them because if
the block declares any locals their life-times and visibility would
change.
* GH-1592: Add peephole optimizer for final AST tuning. (Robin Sommer, Corelight)
We use this to remove two statement constructs that the main optimizer
may leave behind:
1. `default<void>()`
2. `(*self).__error = __error; __error = (*self).__error;`
The second case is a quite specific situation that eventually, once we
have CFG/DFG tracking, the main optimizer should be able to cover more
generically. However, for now, it's just not nice to always have
these blocks in the generated C++ code, so adding this special case
seems useful.
Couples notes on (2):
- Per #1592, case 2 may also have overhead. Closes #1592.
- Technically, this optimization isn't always correct: subsequent
code could assume that `(*self).__error` is set, whereas after
removal it's not (or not to the expected value). However,
`__error` is strictly-internal state, and we know that we don't
use it any different, so this seems ok until we have more
general optimizer logic.
* Disable check for reserved IDs with `--skip-validation`. (Robin Sommer, Corelight)
This allows to write tests that use internal IDs.
* Simplify parsing of literals. (Robin Sommer, Corelight)
There are two changes in here:
1. For grammars that don't use look-ahead, we skip the runtime check for a
pending look-ahead symbol, because we know we will never have one. This removes
generated code of the form `if ( _lah ) { ... }`. This change needs a bit of
machinery unfortunately because we need to get the information about look-ahead
usage over into the codegen for literal parsing.
2. For bytes literals, we now outsource their parsing to a runtime function to
make the generated code simpler. As a side effect this also provides more
informative error messages when the literal isn't found.
Taking the two together means that the code for parsing a plain `b"Foo"`
literal may look like this now:
spicy_rt::expectBytesLiteral(__data, __cur, b"Foo", "b.spicy:4:10-4:15", Null);
__cur = __cur.advance(3);
if ( __trim )
(*__data).trim(begin(__cur));
(*self).foo = b"Foo";
* Optimize parsing for `bytes &size=N`. (Robin Sommer, Corelight)
* Add mode for optimize types parsing. (Robin Sommer, Corelight)
This provides the surrounding infrastructure, but does not yet
implement it for any type.
* Refactor type parsing modes. (Robin Sommer, Corelight)
So far we had a boolean flag to differentiate between "normal" and
"try" parsing. We turn this flag into an enum now so that we can more
easily extend the set of modes later. No functional change otherwise.
* Refactor pre/post parsing logic. (Robin Sommer, Corelight)
This splits out the `&size/&max-size` handling so that we can
special-case that later.
* Cleanup: replace optionals with pointers. (Robin Sommer, Corelight)
For AST nodes, we can/should use nullptrs instead of unset optionals.
The versions using `optional` were still a left-over from the old AST
code.
1.12.0-dev.93 | 2024-09-26 13:07:41 +0200
* Disallow unit variables nested into other items. (Robin Sommer, Corelight)
Initialization would be ill-defined.
* GH-1839: Support `if`-blocks in unit. (Robin Sommer, Corelight)
We now support `if` around a block of unit items:
type X = unit {
x: uint8;
if ( self.x == 1 ) {
a1: bytes &size=2;
a2: bytes &size=2;
};
};
One can also add an `else`-block:
type X = unit {
x: uint8;
if ( self.x == 1 ) {
a1: bytes &size=2;
a2: bytes &size=2;
}
else {
b1: bytes &size=2;
b2: bytes &size=2;
};
};
* Add a new unit item `Block` that stores a sequence of subitems. (Robin Sommer, Corelight)
We now have a new unit item `Block` that represents a sequence of
sub-items. This goes along with a new production `Block` that the item
gets turned into inside the grammar.
The implementation reuses logic from `unit::item::switch_::Case`. We
then refactor `Case` to use the new `Block` instead of continuing to
maintain its own list of items.
We also refactor some other logic previously located with `Switch`
to now apply to `Blocks` as well.
Optionally, a `Block` can have a conditional expressions, an
"else"-block as well attributes with it. We don't use them yet, but
will soon; tests then coming as well. However, we already also
refactor some attribute logic that previously went just with the
`Switch` unit item to now also apply to the `Block` item, so that the
two can support the same attributes.
* Remove `production::Boolean`. (Robin Sommer, Corelight)
Turns out this was neither used nor fully implemented.
* Run a final resolver pass after optimization. (Robin Sommer, Corelight)
Optimization may leave node state unset, such as canonical IDs if we
created new nodes. The final resolver pass gets that back to a
well-defined state.
This also moves the debug output for `ast-final` out of the resolver
loop; it now executes at the very end after optimization. That's a
change I've been meaning to do for a while because the "final" AST
wasn't really final. `ast-resolved` now shows what used to be
`ast-final` (and no longer each iteration of the resolver).
Test coming with subsequent commit, where a new test would break
otherwise.
1.12.0-dev.87 | 2024-09-26 13:07:06 +0200
* Compile generated C++ with `-O0` in debug mode. (Robin Sommer, Corelight)
* Bump centos-stream in documentation. (Benjamin Bannier, Corelight)
* Bump latest release in docs. (Benjamin Bannier, Corelight)
1.12.0-dev.83 | 2024-09-19 08:58:15 +0200
* Fix attributes not getting validated. (Evan Typanski, Corelight)
This particularly fixes the case where a ctor within a field will not
have its attributes validated because it does not go through
`isParseableType`. The "generic attribute checks" are moved to apply to
all attributes, while type-specific ones remain in `isParseableType`
1.12.0-dev.81 | 2024-09-18 14:42:34 +0200
* Add internal error for unvalidatable vectors. (Evan Typanski, Corelight)
Vectors were changed to never be a vector-of-type, just a
vector-of-items that redirect to a type. This allows validating
attributes and whatnot properly, which only apply to fields, not types.
The consequence is that a vector-of-types should be impossible, or else
there is an error in the parser's workarounds to get to that state,
which is a bug.
Add an internal error so that case never surfaces without a big scary
error.
* Move legacy vector syntax check to new production. (Evan Typanski, Corelight)
* Remove vector builders for type/ids. (Evan Typanski, Corelight)
Currently, vectors need the inner type to actually point to a field.
This removes the repeat from builder methods that would otherwise need a
field but don't so that they aren't mistakenly added.
1.12.0-dev.77 | 2024-09-11 08:18:38 +0200
* GH-1847: Fix resynchronization issue with trimmed input. (Robin Sommer, Corelight)
When input had been trimmed, `View::advanceToNextData` could end up
returning a view starting ahead of the valid area.
* Add missing printer support for exception values. (Robin Sommer, Corelight)
* GH-1860: Fix parsing for vectors of literals. (Robin Sommer, Corelight)
This was broken in two ways:
1. with the `(LITERAL)[]` syntax, the parser would not recognize literals
using type constructors
2. with the syntax `LITERAL[]`, we'd try to store the parsed value
into a vector
* Fix error message. (Robin Sommer, Corelight)
1.12.0-dev.71 | 2024-09-06 16:29:02 +0200
* Move some `scoped_id` out for better locations. (Evan Typanski, Corelight)
* Fix `vector<>` syntax missing diagnostics. (Evan Typanski, Corelight)
* GH-1832: Fail for vectors with bytes but no stop. (Evan Typanski, Corelight)
Cases with parentheses were caught before this change. The comments
indicate behavior before the change, even though both cases should fail:
public type BytesVectors = unit {
: bytes[10] &size=5; # No diagnostic - oh no!
: (bytes)[10] &size=5; # Error - good
};
That's because a new `_anon` field is created with the parentheses. This
makes the `bytes` in an item, which can have its own `&size` and gets
checked later. But `bytes` by itself cannot have the `&size` attribute
since it's just a type - the vector will use its own `&size` attribute!
The fix makes all types in vectors use the anonymous field instead, so
their lack of attributes are properly modeled during validation.
1.12.0-dev.67 | 2024-09-04 12:44:42 +0200
* GH-1852: Fix `skip` with units. (Robin Sommer, Corelight)
For unit parsing with `skip`, we would create a temporary instance
but wouldn't properly initialize it, meaning for example that
parameters weren't available. We now generally fully initialize any
destination, even if temporary.
* Do not by default skip CI in update-changes cfg. (Benjamin Bannier, Corelight)
We rely on CI to create releases, so having `[skip CI]` in our
update-changes config was always a hassle which needed to be edited out
when creating a release (just tagging would autocommit with the message
template without a way to change the commit message before tagging).
Drop this part to make releases easier and less error prone.
1.12.0-dev.63 | 2024-08-26 17:10:07 +0200
* Bump 3rdparty/any from `7c76129` to `a05d5ad` (dependabot[bot])
1.12.0-dev.61 | 2024-08-14 14:04:27 +0200
* Suppress warning `invalid-offsetof`. (Robin Sommer, Corelight)
We now run into these in CI due to `-Werror`.
Also clean up existing duplication of
`-Wno-unused-command-line-argument`.
* GH-1835: Fix computation of declaration dependencies. (Robin Sommer, Corelight)
One more branch we had to follow to track down all dependencies
between global declarations: external qualified types.
1.12.0-dev.58 | 2024-08-09 15:57:52 +0200
* Add `string::lower` and `string::upper` (Evan Typanski, Corelight)
These already had a whole impl in the `string` namespace, so this just
hooks it up so you can use them reasonably on a string type in a spicy
file. It also switches those impls to use `string_view`
* Fix `split` and `split1` docs to be more accurate (Evan Typanski, Corelight)
The documentation for the bytes method seemed incorrect (or, at least,
confusing) where an empty separator did *not* split on white space. This
correctly documents the behavior with what the actual behavior is
(especially for split1).
Note, the behavior is a bit confusing, unintuitive, and maybe
inconsistent between the implementations of `split` and `split1`.
Namely, `split1` will sometimes but the entire value in the first
element of the tuple (if a match isn't found) or the second element (if
the separator is just an empty string).
For now this just updates the documentation, but I'd also welcome a
change in behavior.
* Add `string::split` and `string::split1` helpers (Evan Typanski, Corelight)
This doesn't do any work to fix the TODOs that were already in the Bytes
version. Also, there is a small unclear bit about the documentation for
the `split` op in both Bytes and now string, namely that an empty `sep`
argument should make the break take place at sequences of whitespace,
but it does not do that. It may also be misreading the documentation,
but one of the two should probably be clarified.
* Add `string::starts_with` helper (Evan Typanski, Corelight)
1.12.0-dev.53 | 2024-08-09 15:45:05 +0200
* GH-1831: Fix optimizer regression. (Robin Sommer, Corelight)
We were no longer marking types as used that are referenced through a
type name.
1.12.0-dev.51 | 2024-08-09 15:43:51 +0200
* Fix AST dependency computation. (Robin Sommer, Corelight)
Turns out the logic wasn't quite right yet. It worked well enough if
all units get wrapped into `ValueReference` because then we just need
a forward declaration to make that work; and those we would always
output first. But it wouldn't actually track dependencies correctly to
output them in the right order so that full struct are available where
needed. This commit reworks that logic. Turns out this also further
simplifies logic and state tracking, which is a good sign.
* Provide dependency information to AST transformations. (Robin Sommer, Corelight)
Before, we computed dependency information only once just before final
codegen, meaning that Spicy-to-HILTI lowering would not have it
available. This adds a separate dependency computation just before the
transformation pass. While this isn't used yet, the change prepares
for future functionality leveraging dependencies during lowering (like
selective application of `&on-heap`).
* Make AST dependency information deterministic. (Robin Sommer, Corelight)
The dependency information returned by the context wasn't sorted, with
order depending on pointer values. That means that users of that
information would end up processing dependencies in an undetermined
order, making their results non-deterministic.