-
Notifications
You must be signed in to change notification settings - Fork 5
/
2018-1 Object Databases.fex
1591 lines (1456 loc) · 54.4 KB
/
2018-1 Object Databases.fex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
introduction I
====
approach a new project:
show the problem first
identify MVP (whole solution stack)
then present a solution
product owner:
lives for project
has background infos (technology stack, DevOps)
able to define iterations (specifically MVP)
supports other developers (chooses the right architecture)
basic implementation tipps:
show how UI is supposed to look with mockups to customer
model database close to real world, as this does not change
OO vs database:
class vs relation
object vs row
value vs cell
attribute vs domain
programming paradigns:
declarative programming (functional, logical)
imperative programming(procedural, object-oriented)
introduction II
====
motivation:
simplified application development
OO data should be fully persistable
database should benefit OO advantages
simplify design & evolution
no translation at run time
orthogonal persistence:
type orthogonality:
all objects can be persisted in any way
persistence by reachability:
identifying persistent objects not related to type system
livetime of object determined by reachability of root elements
persistence independence:
manipulation of long/short term data looks same
persistence strategies:
by inheritance (extend base persistent class)
by instantiation (with construction pass persist infos)
by reachability (if reachable from other persistent object)
SQL:
data definition language (DDL):
definition of data models
relations, attributes, keys
modify tables, attributes, keys
"class definition"
data manipulation language (DML):
creating & management of data
relation tuple, predicates
execute inserts, updates, deletes
"writing of values"
query language (QL):
concepts supporting retrieval of data
query, predicate, query result
use projections, selections, joins
"reading of properties"
types:
specification:
properties (attributed, relationships)
operations
exceptions
interface:
defines abstract behaviour
class:
defines abstract behaviour & state
literal (struct, int):
defines abstract state
implementation:
language binding, multiple per specification
representation:
data structures motivated by abstract state
instance variable for each abstract variable
methods:
procedure bodies motivated by abstract behaviour
possibly private methods not included in spec
subtyping (relationships):
is-a (behaviour)
extends (state & behaviour)
OMDG standard
======
object management group (OMG):
tools & architecture for OO development
distributed object management
create unified modelling language UML
informal standards body for major vendors
promotes portability/interopabilities
does not develop products
object data management group (ODMG):
complementary to OMG
adds data management support
object definition language (ODF)
object query language (OQL)
bindings to java, smalltalk, C++
object model:
based on OMG object model
objects have unique identifier, literals do not
state defined by properties
behaviour defined by operations
categorization possible, common properties & operations
extent:
set of all active instances of type
collections:
set (unordered, no duplicates)
bag (unordered, with duplicates)
list (ordered, insert elements)
array (ordered, replace elements)
dictionary (key-value store)
collection operations:
subset only for sets
union, intersect, difference only for bags, sets
relationship:
many-to-many (collections for source, target)
many-to-one (collection for source, class inverse)
one-to-one (class for source, target)
no ternary
persistence:
by reachability
database exposes root objects, schema, types
other concepts:
database operations, locking, transactions, metadata
built in dates, times, intervals
object definition language (ODL):
supports constructing ODMG models
compatible to OMG interface definition language (IDL)
defines name, extent, exceptions, attributes, relationships
define method signatures (to be implemented later)
object query language (OQL):
looks like SQL
can use path expression like me.neighbour.name
not complete, no explicit update
returns bag, set with distinct, list with order by
can use subqueries
aggregation for collections (AVG, SUM, MIN, COUNT)
union, intersection, except for sets
can flatten collections
OQL example:
select values from collections where condition
where exists b in books:: b in a.books
where for all p in a.authors:: p.year < 1000
object oriented databases (OODB)
=====
connection between OO world & database
avoids mismatch between objects/relations
provides a uniform data model
combines features & properties of both worlds
managing object data, making OO languages persistent
OODB manifesto:
define features, optional characteristics, open choices
but important properties missed (according to relational db people)
features of OO systems:
complex objects:
complex objects formed from simpler onces, constructors
tuples (represent entities & attributes)
sets (collection of entities)
lists (capture order)
need transitive retrieval, CRUD, copy
need identity (same GUID), equality (same state)
db only knows sets, atomic tuples
identity:
if GUID or state matches then equal
sharing with references
shallow & deep equality
encapsulation:
object data & methods implement interface
state only modified through public interface
data exposed for declarative queries
types:
define object properties, static safety
object structure & behaviour, separated from interface
generalizations:
enable better modelling, reuse, semantic complexity
inheritance (attributes & methods from superclass)
specialize or generalize
inheritance:
substitution (more operations, based on behaviour)
inclusion (based on structure)
constraint (like inclusion, sub is constraint on super)
specialization (sub contains more specific info)
method overriding:
redefine, specialize method in sub
method overloading:
multiple versions of method exist
late binding:
appropriate method call is selected at runtime
computational completeness:
any computable function expressible
extensibility:
devs can add native types to database
features of db systems
persistence:
data survives program execution
orthogonal implicit persistence
orthogonality of type system & persistence (all can be saved)
efficiency
secondary storage (indexes, buffers) optimize
concurrency:
multiuser, serialization
atomicity, consistency, isolation, durability
reliability:
resilient to user, software, hardware failures
transactions, snapshots, logging
declarative query language:
high level (conciseness of non-trivial tasks)
efficient execution (can be optimized further)
application independent (any database may execute it)
say "what" not "how"
optional characteristics:
multiple inheritance, type checking/inference
distribution, versions, long/nested transactions
open choices:
program paradigm (declarative vs imperative)
representation system (sets, lists, more?)
type system (generics?)
uniformity (type, method as objects?)
more tools:
database administration
view definition / view derived data
objects with roles (dynamically add/remove, like traits)
database evolution (migrate gracefully)
set integrity, semantic, evolution constrains
define, modify & enforce constrains
technical overview:
architecture:
vary greatly, not simple server-client model
all do caching, locking for query/transaction processing
need livecycle management for objects
granularity:
per object/page/container
different runtime / performance characteristics
querying:
try to remove join concept and replace with relationships
performance depends on execution place, flexibility of query, indexes
identity management:
for relationships, uniqueness
physical for fast dereferencing but limited flexibility
logical for immutability but needed access translation
object relational mapping
========
mappings:
db <-> program (interface)
db <-> world (enterprise modelling)
program <-> world (simulation)
problems:
object identity (serial/read out produces duplicates)
multiple models (db, program, world)
impedence mismatch (map OO to database)
implement transformation (design time)
execute transformation (run time)
application-specific transformation (therefore reimplement often)
relational database:
ok for small applications
not ok for large with inheritance, multi-value attributes
object relational mapping (ORM):
map OO to relational model
persistence-related tasks already implemented
persistence API (java, annotations define mapping)
set up ORM:
top-down (oop -> mapping -> rdbms)
bottom-up (oop <- mapping <- rdbms)
inside-out (oop <- mapping -> rdbms)
outside-in (oop -> mapping <- rdbms)
hibernate:
java to sql, lots of backends supported
requirements:
ensure there is a non-argument constructor
create mapping (id, property, set (key, relationship))
configuration:
connect to database, specify name & DBMS
session factory:
factory = new Configuration().configure().buildSessionFactory()
factory.close()
session:
session = factory.openSession()
session.beginTransaction()
save, close, query, update, delete
session.beginTransaction()
session.createQuery("FROM persons")
session.getTransaction().commit()
session.close()
associations:
unidirectional, bidirectional, ordered
one-to-one, many-to-one, many-to-many
inheritance:
table per class hierarchy (sparse, subclass)
table per subclass (duplicated fields, joined-subclass)
table per concrete class (no abstract, union-subclass)
strategy can be defined for each part of application
annotations:
replace xml with annotations, same keywords
solved problems:
object identity
implementation transformation
application specific transformation
remaining problems:
impedence mismatch
multiple models
transformation at run time
annotation maintenance (new)
discussion:
good at design time, bad at run time
android
======
application model:
activities for UI
services for computation
providers for data
intents request use of application components
manifest files exposes components & defines start activity
livecycle:
activity reacts state changes
data management:
SQLiteOpenHelper creates SQLiteDatabase
Cursor c = db.rawQuery("SELECT * FROM persons")
c.moveNext(), c.isLast(), c.getString(2), c.getInt(3)
ContentValues object for CRUD (simple map)
content resolver:
reacts to URIs like content:://ch.acm.personprovider/pictures/12
content resolver invokes correct provider
content provider:
encapsulates data management, can be exposed
CRUD, getType() methods, uses Cursor, URI, ContentValues
example location:
multiple providers (GPS, WLAN)
listener waits for updates
manager chooses best provider, has last known location
lack of:
orthogonal persistence (type orthogonality, independence, identity)
completeness (definition not stored; versioning problems)
scalability (entire object graphs need to be persisted)
db4o
====
object-based architecture w/ physical identity
meta:
open source native object database for .NET, java
key features:
no conversion/mapping needed
no changes to objects to be persistent
local or client/server mode, single line of code
ACID transactions
caching & integration in native garbage collection
architecture:
file/in-memory database
I/O adapter
ACID transactional slots
object part:
marshaller
reference system, reflector, class metadata
api
query part:
class/field index b-trees
index query processor
SODA query processor
native & SODA queries, query by example
object container:
connection to database
unit of work; owns one transaction
manages object identities, loads/unloads objects
starts new transaction after commit/abort
commits implicitly after closing container
persist:
store(), delete()
on create persistence by reachability
on update depth is 1 per default, only primitive values
on delete no cascade per default
objects linked by weak reference
config.common().objectClass(Author.class).cascadeOnDelete(true)
retrieve:
by example (findBy, pass partly filled out POCO)
native (using predicates)
soda (decend("year"), contrain(class), uses the graph)
consistency:
refresh() syncs DB state, call after delete()
database-aware collections
transparent persistence:
let objects implement Activatable interface
objects bound to framework on retrieve
modify at will, then call commit to save changes
activation:
only loaded to certain depth, fields set to default otherwise
occurs on collection element access, or explicit activate()
depth tradeoff is manual activate() vs heavy memory usage
transparent activation:
on property set, activate property
database registers itself on instance creation
byte code insertions does this automatically
transaction isolation levels:
read uncommitted (even uncommitted values can be read)
read committed (only committed values can be read)
repeatable read (always same value read)
serializable (possible serial execution order)
transactions:
thread-safe, but single-thread core
no data loss, automatic recovery on system failure
rollback discards changes, call refresh() to clean up
read-committed (only committed values can be read)
collision detection:
peekPersisted() to get unbound instance of db version
get read committed or stored values (configurable)
collision avoidance:
db.ext().setSemaphore(GUID, MAX_TIME_WAIT)
discussion:
no impedence mismatch
orthogonal persistence:
independence (yes)
data type orthogonality (yes)
identification (yes)
but explicit store/retrieve application logic
issues:
depths of activation/deletion/update is a new burden
lack of synchronization on delete/update
"transparency" contradicts type orthogonality
configuration & tuning:
defragment:
remove unused fields, classes, meta data
compact database file
statistics:
log query behaviour & performance
log IO/network activity
log:
get all objects & classes in db file
indexes:
optimize query evaluation
tradeoff between query & modifications
B-trees on single fields; automatic or explicit
speed tuning:
object loading:
configure activation depths (less loaded initially)
use multiple containers (less complex containers)
disable weak references (no lazy loading)
database tests:
disable database schema change detection
disable testing of classes at startup (less validation)
query evaluation:
set indexes appropriately
optimize native queries
distribution:
embedded mode:
clients use same VM with db file
direct file access (single thread, same user)
client session (multiple threads, same user)
client/server mode:
clients use multiple VM, connect to server with db
can only use methods from object containers
use "out-of-band" signalling to transfer other messages
replication:
multiple servers, redundant copies
snapshot (periodically, single-master)
transactional (operation based, immediately)
merge (clients send changes to master, periodically or instant)
developer defines replication with masters/slaves
replication:
separated from core
bridge between db4o & relational databases
uni/bidirectional replication between hibernate, db4o
transfers data between providers
steps:
configure to use UUIDs & commit timestamps
create replication object & define conflict handler
configure direction (bidirectional is default)
call replicate(myObj), commit() on each object (transactional)
call objectsChanged and then replicate(iterator) (snapshot)
call close()
callbacks:
event triggers (activate, deactivate, new, update, delete)
can prefix called before, on prefix called afterwards
implement as much as needed
use case:
record/prevent updates
check integrity / set default values
create indexes dynamically if used often
control object instantiation:
can be configured per class/project
bypassing constructor (default method):
if no constructor works, if framework supports
constructor usage:
all constructors are tested with default values
first one which works is used from now on
if none found, object is not persisted
translator:
implement ObjectConstructor interface
convert object to custom entity (like Object)
type handlers:
like translators, but at lower level
translator with byte arrays, handler converts objects
Versant
=====
object-based architecture w/ logical identity
meta:
commercial OODBMS
object database for java
company:
now owned by actian
market leader in ODBMS
telecom, military, financial, transportation
architecture:
RAID, NAS, raw devices/filesystems
virtual system layer
versant server (logical, physical log file)
versant network layer
versant manager
C, C++, java interface
languages:
java versant interface (JVI)
versant query language (VQL)
dual cache:
client has object cache
server has page cache (for each db)
database volumes:
system volume (class descriptions, object instances)
data volumes (increase capacity, optional)
logical log volume (transactions, redo-info)
physical log volume (physical data information)
versant manager:
manipulates, caches, provides, marshals objects
transaction management
distributes requests for queries, updates to server
object descriptor table (ODT):
logical object identifier (LOIC) -> memory, db location
used on property access, defines if retrieved from memory or db
versant server:
updates, caches, retrieves objects
defines transactions, locks objects
logging, recovering
index maintenance
thread architecture:
clients:
multiple session with own cache
have multiple assigned client threads
servers:
multiple server threads process client requests
access page cache, respect its lock table
log buffer thread with async IO of uncommitted writes
background page flusher which writes modified pages
java versant interface (JVI):
store java objects (fine with GC, multithreading)
client-server architecture (local cache, server queries)
sits below java VM
fundamental layer:
database-centric, handlers manipulate objects
class & attributes builders define classes
create handles with LOID, new instances
use handle to get/put values
fundamental query/result
examples:
CODE_START
//schema definition
AttrString name = session.newAttrString("name");
AttrBuilder nameBuilder = session.newAttrBuilder(name);
var attrBuilders = session.withAttrBuilders({ nameBuilder })
ClassHandle person = attrBuilders.defineClass("Person");
person.createIndex(“name”, Constants.UNIQUE_BTREE_INDEX);
//data manipulation
person = session.locateClass("Person");
Handle florian = person.makeObject();
florian.put(name, "Florian");
String val = florian.get(name);
String loid = florian.asString();
florian = session.newHandle(loid);
//querying
FundQuery query = new FundQuery(session, "select name from Person");
FundQueryResult result = query.execute();
Handle resultSetMember = result.next();
CODE_END
transparent layer:
language-centric, maps classes & attributes to fundamentals
persistent java object caching & retrieval
first class object (FCO):
have LOIC, save, query, retrieve individually
changes saved automatically, applied on commit
references to other FCO always valid
transient fields not in db
deleteObject() for db, collected by GC later (then finalize() called)
second class object (SCO):
saved as part of FCO, can't be queried
java byte stream if no versant type
transient fields not in db
FCO references serialized separately
changes applied only if owner marked as dirty on commit
if in two FCO, will be two different instances after fetch
delete implicitly by reference removal
persistence categories:
for FCO (p, c) possible, marked dirty on modification
parent class must be same category
(p) always, new instances directly persistent
(c) capable, makeRoot(), makePersistent() or reachable persistence
for SCO (d, a, n) possible
(d) transparent dirty owner, sets owner as dirty
(a) persistence aware, can modify FCO, must call dirtyObject() explicitly
(n) not persistent, can't access fields of persistent object
persistence model:
persistence by reachability
can elect named roots of graphs for retrieval
navigate starting at identity, root, class, query
versant transparently locks & retrieves
example:
CODE_START
TransSession session = new TransSession("myDB");
Set<P> ps = session.findRoot("rootName");
Person florian = new Author("Florian");
ps.get(0).addAuthor(florian);
session.commit(); session.endSession();
CODE_END
ODMG:
language-centric, transaction-model, collections
are FCO, follow ODMG standard
queries:
additional functionality, iff persistent collection, (p) objects
existsElement, query, select, selectElement
only elements of collection queried
VQL:
complex expressions, server-side sorting, indexing
query string which is compiled, optimized, executed on server
parametrization with $ sign, late binding, can rebind
example:
CODE_START
Publication pub = new Publication("Web 2.0 Survey");
String q = "select name from Author where name = $name";
Query query = new Query(session, q);
query.bind("name", "Stefania Leone");
QueryResult result = query.execute();
q = "select name from Author where Author::Books subset_of $books";
query = new Query(session, q);
query.bind("books", new [] {florian});
CODE_END
application development:
persistence aware java classes
deal with sessions, transactions & concurrency
specify persistence category for classes
enhancer performs byte-code changes
create db & run application
byte code enhancement:
create object in db on first instance construct
read/write objects, attributes to/from db
FCO pointers evaluated using ODT
sessions:
all actions must be performed in sessions
access to db, methods, data types, persistent objects
multiple sessions possible
must close sessions explicitly
client session has object cache, ODT
server session has page caches in shared memory
transactions:
always in transaction, commit/rollback starts new one
endSession commits last one
atomic, consistent, independent, durable, coordinated (with locking), 2PC
commit() flushes cache, releases locks
checkpointCommit() retains caches, locks
commitAndRetain() retains caches, releases locks
object livecycle:
creation of persistent objects (memory, versant cache)
commit (data written to db, hollow proxy remains in memory)
rollback (new objects will be dropped)
query (evaluated on server, proxy created for each result)
access (fetch or deserialize object)
JVI client cache loader:
client-side object cache, server tracks state of clients
contains query results, navigation results
dereference consists of RPC, object lookup, IO
improve efficiency:
vendor specific batch loading
configurable strategies:
breadth (other trees, same level), depth (deeper level), path loading
collections:
standard collections supported (list, array, hashtable)
FCO collections (VVector, VHashtable)
SCO collections (DVector)
FCO large collections (LargeVector, fine-grained locking)
ODMG collections
event notification:
from db to registered clients
class events (CRUD of any instance)
object events (CRUD of certain element(s))
transaction demarcation (begin/end transaction)
user-defined events
event channels:
register listeners to channels
global namespaces of channels over applications
class, object, query based
EventClient, ChannelBuilder
persistent object hooks:
at any sort of state changes
transient attributes, caches, housekeeping of integrity
activate, deactivate, pre/post read/write, delete
can change other objects in hooks
schema evolution:
add/rename/remove leaf classes
change class methods, attributes
does lazy updates of instances
polymorphic indexes:
enhance performance of retrieving object with its subclasses
social network analysis
====
social networks:
connected people (calls, chats) represented as graphs
calculates degree, closeness, betweenness centrality
find out:
key persons of a group of people
message traversal through group
communication patterns
nodes:
key/value pair of people
edges:
key/value with properties & assigned weight
uni/bidirectional, explicit/implicit, short/long, single/multiple
traffic with particular keyword
work/behaviour patterns
activity such as transfers, payments
graph:
path lengths:
150 max social relationships (dunbars number)
4.56 avrg distance of publications connected to erdos (erdos number)
2.946 avrg distance of movies made with kevin bacon (bacon's number)
6 avrg distance to know everyone on the world
node properties:
degree centrality:
number of direct connections of a node
betweenness centrality:
between two important nodes
high influence over what flows through the network
closeness centrality:
node with shortest paths to all others
can monitor information flow best
network structure:
network centralization:
centralized if one or few central nodes
removing these nodes leads to fragmented network
density / cohesion:
#direct_ties / #total_possible
distance:
minimum number of nodes to connect two specific nodes
clustering coefficient:
likelihood two associates of certain node are associates too
high means high clustering
general model:
combines different sources, different formats to single model
uniform analysis, uniform result presentation
description in triplets (subject,attribute,object) like Linked Data
graph databases
=====
general:
meta model:
graph containing vertices, edges
edges, vertices may have key-value properties
API:
supports CRUD of metamodel
maybe support traversal of graph
maybe has graph algorithms implemented
characteristics:
ACID, scalable for graphs & big data, REST api
examples:
Objectivity InfiniteGraph, Neo4j, OrientDB
infinite graph:
on top of Objectivity, has graph types & algorithms
distributed graph database
usage:
extend BaseVertex, BaseEdge to use
markModified() after modify, fetch() before read
graphDb.addVertex(v), grapgDB.addEdge(v1, v2, EdgeKind.BIDIRECTIONAL)
graphDb.getNameVertex("name")
trans = graphDb.beginTransaction(AccessMode.READ_WRITE)
navigator engine:
result quantifier (append path to results)
result handler (what happens with path in results)
path qualifier (continue path or not)
path guide (which way to continue, DFS or BFS)
implement Qualifier.qualify(Path p)
Neo4j:
usage:
new GraphDatabaseFactory().newEmbeddedDatabase("name")
graphDb.beginTx() tx.success(), tx.failure(), tx.finish()
n = createNode(), n.setProperty("prop", "val")
rela = n.createRelationshipTo(n2, R.MY_TYPE), rela.setProperty("name", 30)
integrate with java:
write wrapper with Node n as private final
graph traversal:
td = graphDb().traversalDescription().breadthFirst()
td.relationships(R.MY_TYPE, Direction.outgoing), td.evaluator(eval)
tra = description.traverse(n), for Path p:: tra
implement evaluate(Path p) returning exclude/include, continue/prune
algorithms:
shortest paths, given length paths, all paths between n1, n2
rest APi:
CRUD, traversals, algorithms
cypher:
query language based on pattern matching (START, MATCH, RETURN)
START n=node(12) MATCH n-[:author]->b RETURN b.email
relationship patterns include (A) -> (B), A-[:coauthor]->B
objectivity DB
======
general:
container-based architecture w/ physical identity
meta:
OODBMS since 1995, v10, C++
java, c#, python, smalltalk, ... frontends
data replication, fault tolerance options
all platforms like windows, solaris, linux, mac
cloud computing in AWS
customers from all branches
ideal applications:
store, process complex structures (trees, collections, graphs)
relationship hunting, protein structure, correlation analysis
client architecture:
languages interfaces provide access to objects & schema
local storage / transaction cache
client objectivity server (data from local storage, remote processing)
server architecture:
lock server which grants permissions
query server to run queries
data server which handles the memory
performance:
clustering & multi-dimensional indexing
client side, cross-transactional caching
parallel query engine (PQE):
client side task splitter which can aim queries at specific dbs, containers
can split query to multiple agents for parallel processing
storage scopes:
federation (schema, database catalog) as a file (world.fbd)
databases (container dialog) as a distributable file (person.world.DB)
container (page map, for logical partitions) consisting of pages
storage:
each scope/hierarchy can contain up to 2^16 of lower level
pages exists as logical, physical, transferred & locked as an unit
objects (consisting of slots) stored in container, addressed with page id
page map:
maps physical to logical pages, journal file saves mapping
on transaction, changes persisted to new defragmented page
on transaction commit, page map is updated, lock released
c# persistence designer:
used to update schema
generates c# objects, can generate federation files
persistent object model:
basis types ("primitives")
complex types, embedded in parent or referenced (OID stored)
enumerations, collections, relationships
relationships:
unary, binary, to-one, to-many
referential integrity maintained by system (incl. inversion)
storage:
default, non-inline (array stores (identifier, OID))
inline (stored as fields on object, to-many stored in array)
binary associations as complete separate construct
propagation:
deletion, locking can be propagated over relations
developer specifies how
versioning:
when object is copied, specify what happens to relations
copy (new, old associated with same objects)
drop (copy does not have references set)
move (copy has the references set, original does not)
domain classes:
partial classes from .NET to separate application/persistence code
persistence implemented in base class
author.cs contains public get;set to private props
authorpd generated with private get;set; which do persistence stuff
support class w/ schema class, attributes & its properties, proxy cache
connection architecture:
static functions to startup(), open connection, shutdown()
one connection to federation per application
n sessions w/ cache, transaction state, has one, many threads
cache kept after commit, flushed on abort
if update too big, overflow pages prewritten to disk
interaction:
connection - > create session -> begin transaction, get federation
federation will lookup database, create new database
persist objects:
give reference of db, container, other entity in constructor
they are connected, persisted automatically
persistent collections:
sets, lists, maps as ordered, unordered and scalable, non-scalable variants
iterator:
provides access to objects meeting certain criteria
scope is collection, container, db, federation
criteria is PQL predicate as a string
not efficient unless indexes preconstructed
lazy filtering, therefore no sorting
scope name:
can name object, collection at each storage hierarchy (like roots)
retrieve objects:
by scope name, link following, lookup with keys & iterators
parallel query with PQE
content-based filtering (supporting primitive types, group lookups)
retrieve objects by group:
use object iterator for storage hierarchy, name maps, root names, name scope
use collection iterator for lists, sets, object maps
LINQ:
language integrated queries, transforms query in method calls
objects (in memory), SQL (MSSQL), XML, DataSet (ADO.NET)
other providers possible like to db4o
Object Store
====
page-based architecture w/ physical identity
query may be executed on server
meta:
personal edition:
lightweight object database
large, single user database
multithreading, small memory footprint
for embedded systems, mobile computing, desktop applications
enterprise:
distributed multi-user
object caching
for clustering, online backup, replication, high availability