forked from AboutUs/kiwi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
bstrlib.txt
3201 lines (2448 loc) · 155 KB
/
bstrlib.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Better String library
---------------------
by Paul Hsieh
The bstring library is an attempt to provide improved string processing
functionality to the C and C++ language. At the heart of the bstring library
(Bstrlib for short) is the management of "bstring"s which are a significant
improvement over '\0' terminated char buffers.
===============================================================================
Motivation
----------
The standard C string library has serious problems:
1) Its use of '\0' to denote the end of the string means knowing a
string's length is O(n) when it could be O(1).
2) It imposes an interpretation for the character value '\0'.
3) gets() always exposes the application to a buffer overflow.
4) strtok() modifies the string its parsing and thus may not be usable in
programs which are re-entrant or multithreaded.
5) fgets has the unusual semantic of ignoring '\0's that occur before
'\n's are consumed.
6) There is no memory management, and actions performed such as strcpy,
strcat and sprintf are common places for buffer overflows.
7) strncpy() doesn't '\0' terminate the destination in some cases.
8) Passing NULL to C library string functions causes an undefined NULL
pointer access.
9) Parameter aliasing (overlapping, or self-referencing parameters)
within most C library functions has undefined behavior.
10) Many C library string function calls take integer parameters with
restricted legal ranges. Parameters passed outside these ranges are
not typically detected and cause undefined behavior.
So the desire is to create an alternative string library that does not suffer
from the above problems and adds in the following functionality:
1) Incorporate string functionality seen from other languages.
a) MID$() - from BASIC
b) split()/join() - from Python
c) string/char x n - from Perl
2) Implement analogs to functions that combine stream IO and char buffers
without creating a dependency on stream IO functionality.
3) Implement the basic text editor-style functions insert, delete, find,
and replace.
4) Implement reference based sub-string access (as a generalization of
pointer arithmetic.)
5) Implement runtime write protection for strings.
There is also a desire to avoid "API-bloat". So functionality that can be
implemented trivially in other functionality is omitted. So there is no
left$() or right$() or reverse() or anything like that as part of the core
functionality.
Explaining Bstrings
-------------------
A bstring is basically a header which wraps a pointer to a char buffer. Lets
start with the declaration of a struct tagbstring:
struct tagbstring {
int mlen;
int slen;
unsigned char * data;
};
This definition is considered exposed, not opaque (though it is neither
necessary nor recommended that low level maintenance of bstrings be performed
whenever the abstract interfaces are sufficient). The mlen field (usually)
describes a lower bound for the memory allocated for the data field. The
slen field describes the exact length for the bstring. The data field is a
single contiguous buffer of unsigned chars. Note that the existence of a '\0'
character in the unsigned char buffer pointed to by the data field does not
necessarily denote the end of the bstring.
To be a well formed modifiable bstring the mlen field must be at least the
length of the slen field, and slen must be non-negative. Furthermore, the
data field must point to a valid buffer in which access to the first mlen
characters has been acquired. So the minimal check for correctness is:
(slen >= 0 && mlen >= slen && data != NULL)
bstrings returned by bstring functions can be assumed to be either NULL or
satisfy the above property. (When bstrings are only readable, the mlen >=
slen restriction is not required; this is discussed later in this section.)
A bstring itself is just a pointer to a struct tagbstring:
typedef struct tagbstring * bstring;
Note that use of the prefix "tag" in struct tagbstring is required to work
around the inconsistency between C and C++'s struct namespace usage. This
definition is also considered exposed.
Bstrlib basically manages bstrings allocated as a header and an associated
data-buffer. Since the implementation is exposed, they can also be
constructed manually. Functions which mutate bstrings assume that the header
and data buffer have been malloced; the bstring library may perform free() or
realloc() on both the header and data buffer of any bstring parameter.
Functions which return bstring's create new bstrings. The string memory is
freed by a bdestroy() call (or using the bstrFree macro).
The following related typedef is also provided:
typedef const struct tagbstring * const_bstring;
which is also considered exposed. These are directly bstring compatible (no
casting required) but are just used for parameters which are meant to be
non-mutable. So in general, bstring parameters which are read as input but
not meant to be modified will be declared as const_bstring, and bstring
parameters which may be modified will be declared as bstring. This convention
is recommended for user written functions as well.
Since bstrings maintain interoperability with C library char-buffer style
strings, all functions which modify, update or create bstrings also append a
'\0' character into the position slen + 1. This trailing '\0' character is
not required for bstrings input to the bstring functions; this is provided
solely as a convenience for interoperability with standard C char-buffer
functionality.
Analogs for the ANSI C string library functions have been created when they
are necessary, but have also been left out when they are not. In particular
there are no functions analogous to fwrite, or puts just for the purposes of
bstring. The ->data member of any string is exposed, and therefore can be
used just as easily as char buffers for C functions which read strings.
For those that wish to hand construct bstrings, the following should be kept
in mind:
1) While bstrlib can accept constructed bstrings without terminating
'\0' characters, the rest of the C language string library will not
function properly on such non-terminated strings. This is obvious
but must be kept in mind.
2) If it is intended that a constructed bstring be written to by the
bstring library functions then the data portion should be allocated
by the malloc function and the slen and mlen fields should be entered
properly. The struct tagbstring header is not reallocated, and only
freed by bdestroy.
3) Writing arbitrary '\0' characters at various places in the string
will not modify its length as perceived by the bstring library
functions. In fact, '\0' is a legitimate non-terminating character
for a bstring to contain.
4) For read only parameters, bstring functions do not check the mlen.
I.e., the minimal correctness requirements are reduced to:
(slen >= 0 && data != NULL)
Better pointer arithmetic
-------------------------
One built-in feature of '\0' terminated char * strings, is that its very easy
and fast to obtain a reference to the tail of any string using pointer
arithmetic. Bstrlib does one better by providing a way to get a reference to
any substring of a bstring (or any other length delimited block of memory.)
So rather than just having pointer arithmetic, with bstrlib one essentially
has segment arithmetic. This is achieved using the macro blk2tbstr() which
builds a reference to a block of memory and the macro bmid2tbstr() which
builds a reference to a segment of a bstring. Bstrlib also includes
functions for direct consumption of memory blocks into bstrings, namely
bcatblk () and blk2bstr ().
One scenario where this can be extremely useful is when string contains many
substrings which one would like to pass as read-only reference parameters to
some string consuming function without the need to allocate entire new
containers for the string data. More concretely, imagine parsing a command
line string whose parameters are space delimited. This can only be done for
tails of the string with '\0' terminated char * strings.
Improved NULL semantics and error handling
------------------------------------------
Unless otherwise noted, if a NULL pointer is passed as a bstring or any other
detectably illegal parameter, the called function will return with an error
indicator (either NULL or BSTR_ERR) rather than simply performing a NULL
pointer access, or having undefined behavior.
To illustrate the value of this, consider the following example:
strcpy (p = malloc (13 * sizeof (char)), "Hello,");
strcat (p, " World");
This is not correct because malloc may return NULL (due to an out of memory
condition), and the behaviour of strcpy is undefined if either of its
parameters are NULL. However:
bstrcat (p = bfromcstr ("Hello,"), q = bfromcstr (" World"));
bdestroy (q);
is well defined, because if either p or q are assigned NULL (indicating a
failure to allocate memory) both bstrcat and bdestroy will recognize it and
perform no detrimental action.
Note that it is not necessary to check any of the members of a returned
bstring for internal correctness (in particular the data member does not need
to be checked against NULL when the header is non-NULL), since this is
assured by the bstring library itself.
bStreams
--------
In addition to the bgets and bread functions, bstrlib can abstract streams
with a high performance read only stream called a bStream. In general, the
idea is to open a core stream (with something like fopen) then pass its
handle as well as a bNread function pointer (like fread) to the bsopen
function which will return a handle to an open bStream. Then the functions
bsread, bsreadln or bsreadlns can be called to read portions of the stream.
Finally, the bsclose function is called to close the bStream -- it will
return a handle to the original (core) stream. So bStreams, essentially,
wrap other streams.
The bStreams have two main advantages over the bgets and bread (as well as
fgets/ungetc) paradigms:
1) Improved functionality via the bunread function which allows a stream to
unread characters, giving the bStream stack-like functionality if so
desired.
2) A very high performance bsreadln function. The C library function fgets()
(and the bgets function) can typically be written as a loop on top of
fgetc(), thus paying all of the overhead costs of calling fgetc on a per
character basis. bsreadln will read blocks at a time, thus amortizing the
overhead of fread calls over many characters at once.
However, clearly bStreams are suboptimal or unusable for certain kinds of
streams (stdin) or certain usage patterns (a few spotty, or non-sequential
reads from a slow stream.) For those situations, using bgets will be more
appropriate.
The semantics of bStreams allows practical construction of layerable data
streams. What this means is that by writing a bNread compatible function on
top of a bStream, one can construct a new bStream on top of it. This can be
useful for writing multi-pass parsers that don't actually read the entire
input more than once and don't require the use of intermediate storage.
Aliasing
--------
Aliasing occurs when a function is given two parameters which point to data
structures which overlap in the memory they occupy. While this does not
disturb read only functions, for many libraries this can make functions that
write to these memory locations malfunction. This is a common problem of the
C standard library and especially the string functions in the C standard
library.
The C standard string library is entirely char by char oriented (as is
bstring) which makes conforming implementations alias safe for some
scenarios. However no actual detection of aliasing is typically performed,
so it is easy to find cases where the aliasing will cause anomolous or
undesirable behaviour (consider: strcat (p, p).) The C99 standard includes
the "restrict" pointer modifier which allows the compiler to document and
assume a no-alias condition on usage. However, only the most trivial cases
can be caught (if at all) by the compiler at compile time, and thus there is
no actual enforcement of non-aliasing.
Bstrlib, by contrast, permits aliasing and is completely aliasing safe, in
the C99 sense of aliasing. That is to say, under the assumption that
pointers of incompatible types from distinct objects can never alias, bstrlib
is completely aliasing safe. (In practice this means that the data buffer
portion of any bstring and header of any bstring are assumed to never alias.)
With the exception of the reference building macros, the library behaves as
if all read-only parameters are first copied and replaced by temporary
non-aliased parameters before any writing to any output bstring is performed
(though actual copying is extremely rarely ever done.)
Besides being a useful safety feature, bstring searching/comparison
functions can improve to O(1) execution when aliasing is detected.
Note that aliasing detection and handling code in Bstrlib is generally
extremely cheap. There is almost never any appreciable performance penalty
for using aliased parameters.
Reenterancy
-----------
Nearly every function in Bstrlib is a leaf function, and is completely
reenterable with the exception of writing to common bstrings. The split
functions which use a callback mechanism requires only that the source string
not be destroyed by the callback function unless the callback function returns
with an error status (note that Bstrlib functions which return an error do
not modify the string in any way.) The string can in fact be modified by the
callback and the behaviour is deterministic. See the documentation of the
various split functions for more details.
Undefined scenarios
-------------------
One of the basic important premises for Bstrlib is to not to increase the
propogation of undefined situations from parameters that are otherwise legal
in of themselves. In particular, except for extremely marginal cases, usages
of bstrings that use the bstring library functions alone cannot lead to any
undefined action. But due to C/C++ language and library limitations, there
is no way to define a non-trivial library that is completely without
undefined operations. All such possible undefined operations are described
below:
1) bstrings or struct tagbstrings that are not explicitely initialized cannot
be passed as a parameter to any bstring function.
2) The members of the NULL bstring cannot be accessed directly. (Though all
APIs and macros detect the NULL bstring.)
3) A bstring whose data member has not been obtained from a malloc or
compatible call and which is write accessible passed as a writable
parameter will lead to undefined results. (i.e., do not writeAllow any
constructed bstrings unless the data portion has been obtained from the
heap.)
4) If the headers of two strings alias but are not identical (which can only
happen via a defective manual construction), then passing them to a
bstring function in which one is writable is not defined.
5) If the mlen member is larger than the actual accessible length of the data
member for a writable bstring, or if the slen member is larger than the
readable length of the data member for a readable bstring, then the
corresponding bstring operations are undefined.
6) Any bstring definition whose header or accessible data portion has been
assigned to inaccessible or otherwise illegal memory clearly cannot be
acted upon by the bstring library in any way.
7) Destroying the source of an incremental split from within the callback
and not returning with a negative value (indicating that it should abort)
will lead to undefined behaviour. (Though *modifying* or adjusting the
state of the source data, even if those modification fail within the
bstrlib API, has well defined behavior.)
8) Modifying a bstring which is write protected by direct access has
undefined behavior.
While this may seem like a long list, with the exception of invalid uses of
the writeAllow macro, and source destruction during an iterative split
without an accompanying abort, no usage of the bstring API alone can cause
any undefined scenario to occurr. I.e., the policy of restricting usage of
bstrings to the bstring API can significantly reduce the risk of runtime
errors (in practice it should eliminate them) related to string manipulation
due to undefined action.
C++ wrapper
-----------
A C++ wrapper has been created to enable bstring functionality for C++ in the
most natural (for C++ programers) way possible. The mandate for the C++
wrapper is different from the base C bstring library. Since the C++ language
has far more abstracting capabilities, the CBString structure is considered
fully abstracted -- i.e., hand generated CBStrings are not supported (though
conversion from a struct tagbstring is allowed) and all detectable errors are
manifest as thrown exceptions.
- The C++ class definitions are all under the namespace Bstrlib. bstrwrap.h
enables this namespace (with a using namespace Bstrlib; directive at the
end) unless the macro BSTRLIB_DONT_ASSUME_NAMESPACE has been defined before
it is included.
- Erroneous accesses results in an exception being thrown. The exception
parameter is of type "struct CBStringException" which is derived from
std::exception if STL is used. A verbose description of the error message
can be obtained from the what() method.
- CBString is a C++ structure derived from a struct tagbstring. An address
of a CBString cast to a bstring must not be passed to bdestroy. The bstring
C API has been made C++ safe and can be used directly in a C++ project.
- It includes constructors which can take a char, '\0' terminated char
buffer, tagbstring, (char, repeat-value), a length delimited buffer or a
CBStringList to initialize it.
- Concatenation is performed with the + and += operators. Comparisons are
done with the ==, !=, <, >, <= and >= operators. Note that == and != use
the biseq call, while <, >, <= and >= use bstrcmp.
- CBString's can be directly cast to const character buffers.
- CBString's can be directly cast to double, float, int or unsigned int so
long as the CBString are decimal representations of those types (otherwise
an exception will be thrown). Converting the other way should be done with
the format(a) method(s).
- CBString contains the length, character and [] accessor methods. The
character and [] accessors are aliases of each other. If the bounds for
the string are exceeded, an exception is thrown. To avoid the overhead for
this check, first cast the CBString to a (const char *) and use [] to
dereference the array as normal. Note that the character and [] accessor
methods allows both reading and writing of individual characters.
- The methods: format, formata, find, reversefind, findcaseless,
reversefindcaseless, midstr, insert, insertchrs, replace, findreplace,
findreplacecaseless, remove, findchr, nfindchr, alloc, toupper, tolower,
gets, read are analogous to the functions that can be found in the C API.
- The caselessEqual and caselessCmp methods are analogous to biseqcaseless
and bstricmp functions respectively.
- Note that just like the bformat function, the format and formata methods do
not automatically cast CBStrings into char * strings for "%s"-type
substitutions:
CBString w("world");
CBString h("Hello");
CBString hw;
/* The casts are necessary */
hw.format ("%s, %s", (const char *)h, (const char *)w);
- The methods trunc and repeat have been added instead of using pattern.
- ltrim, rtrim and trim methods have been added. These remove characters
from a given character string set (defaulting to the whitespace characters)
from either the left, right or both ends of the CBString, respectively.
- The method setsubstr is also analogous in functionality to bsetstr, except
that it cannot be passed NULL. Instead the method fill and the fill-style
constructor have been supplied to enable this functionality.
- The writeprotect(), writeallow() and iswriteprotected() methods are
analogous to the bwriteprotect(), bwriteallow() and biswriteprotected()
macros in the C API. Write protection semantics in CBString are stronger
than with the C API in that indexed character assignment is checked for
write protection. However, unlike with the C API, a write protected
CBString can be destroyed by the destructor.
- CBStream is a C++ structure which wraps a struct bStream (its not derived
from it, since destruction is slightly different). It is constructed by
passing in a bNread function pointer and a stream parameter cast to void *.
This structure includes methods for detecting eof, setting the buffer
length, reading the whole stream or reading entries line by line or block
by block, an unread function, and a peek function.
- If STL is available, the CBStringList structure is derived from a vector of
CBString with various split methods. The split method has been overloaded
to accept either a character or CBString as the second parameter (when the
split parameter is a CBString any character in that CBString is used as a
seperator). The splitstr method takes a CBString as a substring seperator.
Joins can be performed via a CBString constructor which takes a
CBStringList as a parameter, or just using the CBString::join() method.
- If there is proper support for std::iostreams, then the >> and << operators
and the getline() function have been added (with semantics the same as
those for std::string).
Multithreading
--------------
A mutable bstring is kind of analogous to a small (two entry) linked list
allocated by malloc, with all aliasing completely under programmer control.
I.e., manipulation of one bstring will never affect any other distinct
bstring unless explicitely constructed to do so by the programmer via hand
construction or via building a reference. Bstrlib also does not use any
static or global storage, so there are no hidden unremovable race conditions.
Bstrings are also clearly not inherently thread local. So just like
char *'s, bstrings can be passed around from thread to thread and shared and
so on, so long as modifications to a bstring correspond to some kind of
exclusive access lock as should be expected (or if the bstring is read-only,
which can be enforced by bstring write protection) for any sort of shared
object in a multithreaded environment.
Bsafe module
------------
For convenience, a bsafe module has been included. The idea is that if this
module is included, inadvertant usage of the most dangerous C functions will
be overridden and lead to an immediate run time abort. Of course, it should
be emphasized that usage of this module is completely optional. The
intention is essentially to provide an option for creating project safety
rules which can be enforced mechanically rather than socially. This is
useful for larger, or open development projects where its more difficult to
enforce social rules or "coding conventions".
Problems not solved
-------------------
Bstrlib is written for the C and C++ languages, which have inherent weaknesses
that cannot be easily solved:
1. Memory leaks: Forgetting to call bdestroy on a bstring that is about to be
unreferenced, just as forgetting to call free on a heap buffer that is
about to be dereferenced. Though bstrlib itself is leak free.
2. Read before write usage: In C, declaring an auto bstring does not
automatically fill it with legal/valid contents. This problem has been
somewhat mitigated in C++. (The bstrDeclare and bstrFree macros from
bstraux can be used to help mitigate this problem.)
Other problems not addressed:
3. Built-in mutex usage to automatically avoid all bstring internal race
conditions in multitasking environments: The problem with trying to
implement such things at this low a level is that it is typically more
efficient to use locks in higher level primitives. There is also no
platform independent way to implement locks or mutexes.
4. Unicode/widecharacter support.
Note that except for spotty support of wide characters, the default C
standard library does not address any of these problems either.
Configurable compilation options
--------------------------------
All configuration options are meant solely for the purpose of compiler
compatibility. Configuration options are not meant to change the semantics
or capabilities of the library, except where it is unavoidable.
Since some C++ compilers don't include the Standard Template Library and some
have the options of disabling exception handling, a number of macros can be
used to conditionally compile support for each of this:
BSTRLIB_CAN_USE_STL
- defining this will enable the used of the Standard Template Library.
Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.
BSTRLIB_CANNOT_USE_STL
- defining this will disable the use of the Standard Template Library.
Defining BSTRLIB_CAN_USE_STL overrides the BSTRLIB_CANNOT_USE_STL macro.
BSTRLIB_CAN_USE_IOSTREAM
- defining this will enable the used of streams from class std. Defining
BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.
BSTRLIB_CANNOT_USE_IOSTREAM
- defining this will disable the use of streams from class std. Defining
BSTRLIB_CAN_USE_IOSTREAM overrides the BSTRLIB_CANNOT_USE_IOSTREAM macro.
BSTRLIB_THROWS_EXCEPTIONS
- defining this will enable the exception handling within bstring.
Defining BSTRLIB_THROWS_EXCEPTIONS overrides the
BSTRLIB_DOESNT_THROWS_EXCEPTIONS macro.
BSTRLIB_DOESNT_THROW_EXCEPTIONS
- defining this will disable the exception handling within bstring.
Defining BSTRLIB_THROWS_EXCEPTIONS overrides the
BSTRLIB_DOESNT_THROW_EXCEPTIONS macro.
Note that these macros must be defined consistently throughout all modules
that use CBStrings including bstrwrap.cpp.
Some older C compilers do not support functions such as vsnprintf. This is
handled by the following macro variables:
BSTRLIB_NOVSNP
- defining this indicates that the compiler does not support vsnprintf.
This will cause bformat and bformata to not be declared. Note that
for some compilers, such as Turbo C, this is set automatically.
Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.
BSTRLIB_VSNP_OK
- defining this will disable the autodetection of compilers the do not
support of compilers that do not support vsnprintf.
Defining BSTRLIB_NOVSNP overrides the BSTRLIB_VSNP_OK macro.
Semantic compilation options
----------------------------
Bstrlib comes with very few compilation options for changing the semantics of
of the library. These are described below.
BSTRLIB_DONT_ASSUME_NAMESPACE
- Defining this before including bstrwrap.h will disable the automatic
enabling of the Bstrlib namespace for the C++ declarations.
BSTRLIB_DONT_USE_VIRTUAL_DESTRUCTOR
- Defining this will make the CBString destructor non-virtual.
BSTRLIB_MEMORY_DEBUG
- Defining this will cause the bstrlib modules bstrlib.c and bstrwrap.cpp
to invoke a #include "memdbg.h". memdbg.h has to be supplied by the user.
Note that these macros must be defined consistently throughout all modules
that use bstrings or CBStrings including bstrlib.c, bstraux.c and
bstrwrap.cpp.
===============================================================================
Files
-----
bstrlib.c - C implementaion of bstring functions.
bstrlib.h - C header file for bstring functions.
bstraux.c - C example that implements trivial additional functions.
bstraux.h - C header for bstraux.c
bstest.c - C unit/regression test for bstrlib.c
bstrwrap.cpp - C++ implementation of CBString.
bstrwrap.h - C++ header file for CBString.
test.cpp - C++ unit/regression test for bstrwrap.cpp
bsafe.c - C runtime stubs to abort usage of unsafe C functions.
bsafe.h - C header file for bsafe.c functions.
C projects need only include bstrlib.h and compile/link bstrlib.c to use the
bstring library. C++ projects need to additionally include bstrwrap.h and
compile/link bstrwrap.cpp. For both, there may be a need to make choices
about feature configuration as described in the "Configurable compilation
options" in the section above.
Other files that are included in this archive are:
license.txt - The 3 clause BSD license for Bstrlib
gpl.txt - The GPL version 2
security.txt - A security statement useful for auditting Bstrlib
porting.txt - A guide to porting Bstrlib
bstrlib.txt - This file
===============================================================================
The functions
-------------
extern bstring bfromcstr (const char * str);
Take a standard C library style '\0' terminated char buffer and generate
a bstring with the same contents as the char buffer. If an error occurs
NULL is returned.
So for example:
bstring b = bfromcstr ("Hello");
if (!b) {
fprintf (stderr, "Out of memory");
} else {
puts ((char *) b->data);
}
..........................................................................
extern bstring bfromcstralloc (int mlen, const char * str);
Create a bstring which contains the contents of the '\0' terminated
char * buffer str. The memory buffer backing the bstring is at least
mlen characters in length. If an error occurs NULL is returned.
So for example:
bstring b = bfromcstralloc (64, someCstr);
if (b) b->data[63] = 'x';
The idea is that this will set the 64th character of b to 'x' if it is at
least 64 characters long otherwise do nothing. And we know this is well
defined so long as b was successfully created, since it will have been
allocated with at least 64 characters.
..........................................................................
extern bstring blk2bstr (const void * blk, int len);
Create a bstring whose contents are described by the contiguous buffer
pointing to by blk with a length of len bytes. Note that this function
creates a copy of the data in blk, rather than simply referencing it.
Compare with the blk2tbstr macro. If an error occurs NULL is returned.
..........................................................................
extern char * bstr2cstr (const_bstring s, char z);
Create a '\0' terminated char buffer which contains the contents of the
bstring s, except that any contained '\0' characters are converted to the
character in z. This returned value should be freed with bcstrfree(), by
the caller. If an error occurs NULL is returned.
..........................................................................
extern int bcstrfree (char * s);
Frees a C-string generated by bstr2cstr (). This is normally unnecessary
since it just wraps a call to free (), however, if malloc () and free ()
have been redefined as a macros within the bstrlib module (via macros in
the memdbg.h backdoor) with some difference in behaviour from the std
library functions, then this allows a correct way of freeing the memory
that allows higher level code to be independent from these macro
redefinitions.
..........................................................................
extern bstring bstrcpy (const_bstring b1);
Make a copy of the passed in bstring. The copied bstring is returned if
there is no error, otherwise NULL is returned.
..........................................................................
extern int bassign (bstring a, const_bstring b);
Overwrite the bstring a with the contents of bstring b. Note that the
bstring a must be a well defined and writable bstring. If an error
occurs BSTR_ERR is returned and a is not overwritten.
..........................................................................
int bassigncstr (bstring a, const char * str);
Overwrite the string a with the contents of char * string str. Note that
the bstring a must be a well defined and writable bstring. If an error
occurs BSTR_ERR is returned and a may be partially overwritten.
..........................................................................
int bassignblk (bstring a, const void * s, int len);
Overwrite the string a with the contents of the block (s, len). Note that
the bstring a must be a well defined and writable bstring. If an error
occurs BSTR_ERR is returned and a is not overwritten.
..........................................................................
extern int bassignmidstr (bstring a, const_bstring b, int left, int len);
Overwrite the bstring a with the middle of contents of bstring b
starting from position left and running for a length len. left and
len are clamped to the ends of b as with the function bmidstr. Note that
the bstring a must be a well defined and writable bstring. If an error
occurs BSTR_ERR is returned and a is not overwritten.
..........................................................................
extern bstring bmidstr (const_bstring b, int left, int len);
Create a bstring which is the substring of b starting from position left
and running for a length len (clamped by the end of the bstring b.) If
there was no error, the value of this constructed bstring is returned
otherwise NULL is returned.
..........................................................................
extern int bdelete (bstring s1, int pos, int len);
Removes characters from pos to pos+len-1 and shifts the tail of the
bstring starting from pos+len to pos. len must be positive for this call
to have any effect. The section of the bstring described by (pos, len)
is clamped to boundaries of the bstring b. The value BSTR_OK is returned
if the operation is successful, otherwise BSTR_ERR is returned.
..........................................................................
extern int bconcat (bstring b0, const_bstring b1);
Concatenate the bstring b1 to the end of bstring b0. The value BSTR_OK
is returned if the operation is successful, otherwise BSTR_ERR is
returned.
..........................................................................
extern int bconchar (bstring b, char c);
Concatenate the character c to the end of bstring b. The value BSTR_OK
is returned if the operation is successful, otherwise BSTR_ERR is
returned.
..........................................................................
extern int bcatcstr (bstring b, const char * s);
Concatenate the char * string s to the end of bstring b. The value
BSTR_OK is returned if the operation is successful, otherwise BSTR_ERR is
returned.
..........................................................................
extern int bcatblk (bstring b, const void * s, int len);
Concatenate a fixed length buffer (s, len) to the end of bstring b. The
value BSTR_OK is returned if the operation is successful, otherwise
BSTR_ERR is returned.
..........................................................................
extern int biseq (const_bstring b0, const_bstring b1);
Compare the bstring b0 and b1 for equality. If the bstrings differ, 0
is returned, if the bstrings are the same, 1 is returned, if there is an
error, -1 is returned. If the length of the bstrings are different, this
function has O(1) complexity. Contained '\0' characters are not treated
as a termination character.
Note that the semantics of biseq are not completely compatible with
bstrcmp because of its different treatment of the '\0' character.
..........................................................................
extern int bisstemeqblk (const_bstring b, const void * blk, int len);
Compare beginning of bstring b0 with a block of memory of length len for
equality. If the beginning of b0 differs from the memory block (or if b0
is too short), 0 is returned, if the bstrings are the same, 1 is returned,
if there is an error, -1 is returned.
..........................................................................
extern int biseqcaseless (const_bstring b0, const_bstring b1);
Compare two bstrings for equality without differentiating between case.
If the bstrings differ other than in case, 0 is returned, if the bstrings
are the same, 1 is returned, if there is an error, -1 is returned. If
the length of the bstrings are different, this function is O(1). '\0'
termination characters are not treated in any special way.
..........................................................................
extern int bisstemeqcaselessblk (const_bstring b0, const void * blk, int len);
Compare beginning of bstring b0 with a block of memory of length len
without differentiating between case for equality. If the beginning of b0
differs from the memory block other than in case (or if b0 is too short),
0 is returned, if the bstrings are the same, 1 is returned, if there is an
error, -1 is returned.
..........................................................................
extern int biseqcstr (const_bstring b, const char *s);
Compare the bstring b and char * bstring s. The C string s must be '\0'
terminated at exactly the length of the bstring b, and the contents
between the two must be identical with the bstring b with no '\0'
characters for the two contents to be considered equal. This is
equivalent to the condition that their current contents will be always be
equal when comparing them in the same format after converting one or the
other. If they are equal 1 is returned, if they are unequal 0 is
returned and if there is a detectable error BSTR_ERR is returned.
..........................................................................
extern int biseqcstrcaseless (const_bstring b, const char *s);
Compare the bstring b and char * string s. The C string s must be '\0'
terminated at exactly the length of the bstring b, and the contents
between the two must be identical except for case with the bstring b with
no '\0' characters for the two contents to be considered equal. This is
equivalent to the condition that their current contents will be always be
equal ignoring case when comparing them in the same format after
converting one or the other. If they are equal, except for case, 1 is
returned, if they are unequal regardless of case 0 is returned and if
there is a detectable error BSTR_ERR is returned.
..........................................................................
extern int bstrcmp (const_bstring b0, const_bstring b1);
Compare the bstrings b0 and b1 for ordering. If there is an error,
SHRT_MIN is returned, otherwise a value less than or greater than zero,
indicating that the bstring pointed to by b0 is lexicographically less
than or greater than the bstring pointed to by b1 is returned. If the
bstring lengths are unequal but the characters up until the length of the
shorter are equal then a value less than, or greater than zero,
indicating that the bstring pointed to by b0 is shorter or longer than the
bstring pointed to by b1 is returned. 0 is returned if and only if the
two bstrings are the same. If the length of the bstrings are different,
this function is O(n). Like its standard C library counter part, the
comparison does not proceed past any '\0' termination characters
encountered.
The seemingly odd error return value, merely provides slightly more
granularity than the undefined situation given in the C library function
strcmp. The function otherwise behaves very much like strcmp().
Note that the semantics of bstrcmp are not completely compatible with
biseq because of its different treatment of the '\0' termination
character.
..........................................................................
extern int bstrncmp (const_bstring b0, const_bstring b1, int n);
Compare the bstrings b0 and b1 for ordering for at most n characters. If
there is an error, SHRT_MIN is returned, otherwise a value is returned as
if b0 and b1 were first truncated to at most n characters then bstrcmp
was called with these new bstrings are paremeters. If the length of the
bstrings are different, this function is O(n). Like its standard C
library counter part, the comparison does not proceed past any '\0'
termination characters encountered.
The seemingly odd error return value, merely provides slightly more
granularity than the undefined situation given in the C library function
strncmp. The function otherwise behaves very much like strncmp().
..........................................................................
extern int bstricmp (const_bstring b0, const_bstring b1);
Compare two bstrings without differentiating between case. The return
value is the difference of the values of the characters where the two
bstrings first differ, otherwise 0 is returned indicating that the
bstrings are equal. If the lengths are different, then a difference from
0 is given, but if the first extra character is '\0', then it is taken to
be the value UCHAR_MAX+1.
..........................................................................
extern int bstrnicmp (const_bstring b0, const_bstring b1, int n);
Compare two bstrings without differentiating between case for at most n
characters. If the position where the two bstrings first differ is
before the nth position, the return value is the difference of the values
of the characters, otherwise 0 is returned. If the lengths are different
and less than n characters, then a difference from 0 is given, but if the
first extra character is '\0', then it is taken to be the value
UCHAR_MAX+1.
..........................................................................
extern int bdestroy (bstring b);
Deallocate the bstring passed. Passing NULL in as a parameter will have
no effect. Note that both the header and the data portion of the bstring
will be freed. No other bstring function which modifies one of its
parameters will free or reallocate the header. Because of this, in
general, bdestroy cannot be called on any declared struct tagbstring even
if it is not write protected. A bstring which is write protected cannot
be destroyed via the bdestroy call. Any attempt to do so will result in
no action taken, and BSTR_ERR will be returned.
Note to C++ users: Passing in a CBString cast to a bstring will lead to
undefined behavior (free will be called on the header, rather than the
CBString destructor.) Instead just use the ordinary C++ language
facilities to dealloc a CBString.
..........................................................................
extern int binstr (const_bstring s1, int pos, const_bstring s2);
Search for the bstring s2 in s1 starting at position pos and looking in a
forward (increasing) direction. If it is found then it returns with the
first position after pos where it is found, otherwise it returns BSTR_ERR.
The algorithm used is brute force; O(m*n).
..........................................................................
extern int binstrr (const_bstring s1, int pos, const_bstring s2);
Search for the bstring s2 in s1 starting at position pos and looking in a
backward (decreasing) direction. If it is found then it returns with the
first position after pos where it is found, otherwise return BSTR_ERR.
Note that the current position at pos is tested as well -- so to be
disjoint from a previous forward search it is recommended that the
position be backed up (decremented) by one position. The algorithm used
is brute force; O(m*n).
..........................................................................
extern int binstrcaseless (const_bstring s1, int pos, const_bstring s2);
Search for the bstring s2 in s1 starting at position pos and looking in a
forward (increasing) direction but without regard to case. If it is
found then it returns with the first position after pos where it is
found, otherwise it returns BSTR_ERR. The algorithm used is brute force;
O(m*n).
..........................................................................
extern int binstrrcaseless (const_bstring s1, int pos, const_bstring s2);
Search for the bstring s2 in s1 starting at position pos and looking in a
backward (decreasing) direction but without regard to case. If it is
found then it returns with the first position after pos where it is
found, otherwise return BSTR_ERR. Note that the current position at pos
is tested as well -- so to be disjoint from a previous forward search it
is recommended that the position be backed up (decremented) by one
position. The algorithm used is brute force; O(m*n).
..........................................................................
extern int binchr (const_bstring b0, int pos, const_bstring b1);
Search for the first position in b0 starting from pos or after, in which
one of the characters in b1 is found. This function has an execution
time of O(b0->slen + b1->slen). If such a position does not exist in b0,
then BSTR_ERR is returned.
..........................................................................
extern int binchrr (const_bstring b0, int pos, const_bstring b1);
Search for the last position in b0 no greater than pos, in which one of
the characters in b1 is found. This function has an execution time
of O(b0->slen + b1->slen). If such a position does not exist in b0,
then BSTR_ERR is returned.
..........................................................................
extern int bninchr (const_bstring b0, int pos, const_bstring b1);
Search for the first position in b0 starting from pos or after, in which
none of the characters in b1 is found and return it. This function has
an execution time of O(b0->slen + b1->slen). If such a position does
not exist in b0, then BSTR_ERR is returned.
..........................................................................
extern int bninchrr (const_bstring b0, int pos, const_bstring b1);
Search for the last position in b0 no greater than pos, in which none of
the characters in b1 is found and return it. This function has an
execution time of O(b0->slen + b1->slen). If such a position does not
exist in b0, then BSTR_ERR is returned.
..........................................................................
extern int bstrchr (const_bstring b, int c);
Search for the character c in the bstring b forwards from the start of
the bstring. Returns the position of the found character or BSTR_ERR if