-
Notifications
You must be signed in to change notification settings - Fork 4
/
DeeDoc.txt
executable file
·2106 lines (1642 loc) · 80 KB
/
DeeDoc.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
====
Dee_
====
------------------------
makes Python_ relational
------------------------
.. todo:
update css
style sheet! (same as web?)
add .. sidebar:: Sidebar Title
:subtitle: Optional Sidebar Subtitle
add image(s)!
:Author: Greg Gaughan
:Copyright: Copyright (C) 2007 Greg Gaughan
:Licence: GPL (see Licence.txt for details)
:Contact: [email protected]
:Date: 31/03/2007
.. _Dee: http://www.quicksort.co.uk
.. _Python: http://www.python.org/
.. contents::
Introduction
------------
Inspired by Date and Darwen's `Databases, Types and the Relational Model (The Third Manifesto)`_, we're putting forward an implementation of a truly relational language using Python_. We will address two problems:
1. The impedance mismatch between programming languages and databases
2. The weakness and syntactic awkwardness of SQL
Mind the Gap
------------
Most of today's programs handle data in one way or another and often this data is stored in some kind of relational database. To read and modify this data, a program must bridge the gap between its representation and the one used by the dialect of SQL that the database provides. This bridge typically comprises a database API that sends queries as text strings, often accompanied by some kind of table-to-object mapper that has to coerce data and relationships in both directions, usually with elaborate layers of abstraction in an effort to keep the two sides loosely coupled.
"Yet by obscuring the true data source these solutions end up throwing away the most compelling feature of relational databases; the ability for the data to be queried."
-- Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006
.. sidebar::
"It was Codd's very great insight that a database could be thought of as a set of relations, that a relation in turn could be thought of as a set of propositions (assumed by convention to be true), and hence that all of the apparatus of formal logic could be directly applied to the problem of database access and related problems."
-- C. J. Date, `The Database Relational Model`_, Addison-Wesley, April 2000
This approach not only adds complexity and increases the need for data transformations but, most importantly, it destroys the significant advantages provided by the relational model of data. The relational model is built upon predicate logic which brings the power of formal reasoning to data: it is the only sound foundation available.
Enough of the Shenanigans!
--------------------------
A number of approaches and frameworks have been proposed to span the gap between the two systems; most never question why there are two systems in the first place.
Microsoft's forthcoming LINQ to SQL (formerly DLinq) is a major attempt to bring SQL closer into the program than before, but will still keep the database sub-language and all that it entails.
"It is no wonder that applications expected to bridge this gap are difficult to build and maintain. It would certainly simplify the equation to get rid of one side or the other. Yet relational databases provide critical infrastructure for long-term storage and query processing, and modern programming languages are indispensable for agile development and rich computation."
-- Microsoft, DLinq .NET Language-Integrated Query for Relational Data, May 2006
The solution to the problem is not to get rid of one side or the other, nor to have one side overlap the other, but to merge the two sides into one: supersede SQL (the COBOL of database languages) with a true relational programming language, one that is computationally complete, and then the gap disappears. Our solution uses one of the most effective, expressive and readable languages available, Python_, and extends it with relations and a sound relational algebra.
A Bit of History
----------------
Since its inception in 1969 by E. F. Codd, the relational model has been the foundation for nearly all databases. It replaced earlier network and hierarchical ad-hoc approaches to data storage by being as simple as it needed to be, but no simpler. It was so powerful it allowed users to ask for what they wanted to find, rather than specify how they might find it.
Over the decades, SQL has become the de-facto language for relational databases, but SQL misses many of the benefits of relational technology. In recent years, partly due to SQL's weaknesses and partly due to minimalistic and stagnant implementations, the database has become merely a storage engine fronted by layers of drivers, mappers, hierarchical markups and frameworks which make flexible querying both complex and distant from the application code.
Where We're Coming From
-----------------------
Having implemented a comprehensive, standards-compliant SQL server, `ThinkSQL <http://www.thinksql.co.uk>`_, we did some further research into the history of SQL's dominance in the marketplace and its quirky syntax. We found a far superior alternative in the form of **D** [#]_, a generic name for any relational language that conforms to `The Third Manifesto`_. We’ve implemented such a language, Dee_, as an extension to Python_.
The relational algebra and most of the ideas underlying Dee_ come from Date and Darwen's `Databases, Types and the Relational Model (The Third Manifesto)`_. An introduction into the ideas behind it can be found in `Databases in Depth`_ and many related links and reference materials are on `The Third Manifesto website <http://www.thethirdmanifesto.com>`_.
The current version of Dee_ is an initial release to gain feedback regarding the approach. We chose Python_ because its interpreted style, dynamic typing and built-in sets and dictionaries make it ideal for interacting with data; plus any language that allows you to do the following sorts of things has got to be good:
.. sourcecode:: pycon
>>> x, y = 45, 90
>>> print x, y
45 90
>>> x, y = y, x #swapping values without the usual temporary variable!
>>> print x, y
90 45
>>> 70 < x < 120
True
See `Why Use Python? <http://www.python.org/doc/essays/ppt/acm-ws/sld011.htm>`_ for more information on the advantages of the language. A guide to the Python language can be found in `An Introduction to Python`_. We do assume you are familiar with Python in what follows.
Where We're Going
-----------------
The current release is an initial proposal, intended to encourage feedback. We have many ideas for future versions to make it more deployable. See the `Future Work`_ section below for more details.
Basics
------
To start using Dee_ from within the Python_ interpreter or from a Python_ program, first import the module. (For demonstrating we import everything but it's recommended that you only import the features you need.)
.. sourcecode:: pycon
>>> from Dee import *
Tuples
~~~~~~
A Tuple is a set of attribute/value pairs. A Tuple can be represented by a Python_ dictionary, e.g.
.. sourcecode:: pycon
>>> print {"StudentId":'S1', "Name":'Anne'}
{'StudentId': 'S1', 'Name': 'Anne'}
and the attributes and values can be extracted using the standard Python_ syntax, e.g.
.. sourcecode:: pycon
>>> t1 = {"StudentId":'S1', "Name":'Anne'}
>>> t1["StudentId"]
'S1'
>>> "Name" in t1
True
>>> t1.keys()
['StudentId', 'Name']
A more powerful way is to use the Tuple class which allows a slightly simpler syntax for denoting attribute values. To specify a Tuple:
.. sourcecode:: pycon
>>> t1 = Tuple(StudentId='S1', Name='Anne')
and then the attributes values can be extracted in the same way as the Python_ dictionary but also using the dot notation without the quotes, e.g.
.. sourcecode:: pycon
>>> t1["Name"]
'Anne'
>>> t1.Name
'Anne'
The Tuple class also provides a number of useful methods, such as project and remove, for manipulating relational tuples.
Attribute values are dynamically typed in the usual Python_ way and they must be of the same type for every tuple in a given relation. Currently, the types can be anything that can be `pickled <http://docs.python.org/lib/module-pickle.html>`_.
Relations
~~~~~~~~~
A Relation comprises a heading and a body. The heading is a set of attribute name/type pairs. The body is a set of tuples. Each tuple in the body comprises a value for every attribute in the heading. To specify a relation literal, pass the heading as a list of attribute names followed by the body as a list of tuple literals, e.g.:
.. sourcecode:: pycon
>>> print Relation(["StudentId", "Name"],
... [{"StudentId":'S1', "Name":'Anne'},
... {"StudentId":'S2', "Name":'Boris'},
... {"StudentId":'S3', "Name":'Cindy'},
... {"StudentId":'S4', "Name":'Devinder'},
... {"StudentId":'S5', "Name":'Boris'},
... ])
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
Note:
* there is no order to the heading attributes (they are a set)
* nor is there any order to the tuples in the body (they are a set)
* there is no duplication in the heading attribute names (they are a set)
* nor is there any duplication in the tuples in the body (they are a set)
Also, we will try to use the term **relation variable** when we mean a variable that refers to a Relation, and just **relation** (or relation value) to mean the value of the relation. This is an important distinction. The value of a relation never changes, just like the value 5 never changes.
To assign a relation value to a relation variable, use the standard Python_ syntax, e.g.
.. sourcecode:: pycon
>>> IS_CALLED = Relation(["StudentId", "Name"],
... [{"StudentId":'S1', "Name":'Anne'},
... {"StudentId":'S2', "Name":'Boris'},
... {"StudentId":'S3', "Name":'Cindy'},
... {"StudentId":'S4', "Name":'Devinder'},
... {"StudentId":'S5', "Name":'Boris'},
... ])
An alternative way to define a relation is to use the Tuple class to define the body:
.. sourcecode:: pycon
>>> IS_CALLED = Relation(["StudentId", "Name"],
... [Tuple(StudentId='S1', Name='Anne'),
... Tuple(StudentId='S2', Name='Boris'),
... Tuple(StudentId='S3', Name='Cindy'),
... Tuple(StudentId='S4', Name='Devinder'),
... Tuple(StudentId='S5', Name='Boris'),
... ])
or alteratively, a more concise option is available which relies on the order of the body attributes matching the order of the heading:
.. sourcecode:: pycon
>>> IS_CALLED = Relation(["StudentId", "Name"],
... [('S1', 'Anne'),
... ('S2', 'Boris'),
... ('S3', 'Cindy'),
... ('S4', 'Devinder'),
... ('S5', 'Boris'),
... ])
(Note that Python_ allows an additional comma after the last item in a list, which can simplify copy/paste operations. Also a Python_ tuple with a single value must have a comma after the value to distinguish it from a value in parentheses, e.g. ``(7,)`` rather than ``(7)``)
There are a number of ways to display a relation:
1. Print it as a string (i.e. using its ``__str__`` method), e.g.
.. sourcecode:: pycon
>>> print IS_CALLED
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
2. Print a literal representation (one of possibly many variations) (i.e. using its ``__repr__`` method), e.g.
.. sourcecode:: pycon
>>> print `IS_CALLED` #or just: >>> IS_CALLED
Relation(('StudentId', 'Name'),
[Tuple(StudentId='S1', Name='Anne'), Tuple(StudentId='S2', Name='Boris'), Tuple(StudentId='S3', Name='Cindy'), Tuple(StudentId='S4', Name='Devinder'), Tuple(StudentId='S5', Name='Boris')],
{'PK':(Key, None)})
Note: this literal can itself be evaluated using Python_'s ``eval()`` function to retrieve the relation's value, e.g.
.. sourcecode:: pycon
>>> print eval(`IS_CALLED`)
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
>>> r2=eval(`IS_CALLED`)
>>> print r2
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
3. Print it rendered as an HTML table, e.g.
.. sourcecode:: pycon
>>> print IS_CALLED.renderHTML()
<table><thead><th><em>Studentid</em></th><th><em>Name</em></th></thead><tbody><tr><td>S1</td><td>Anne</td></tr><tr><td>S2</td><td>Boris</td></tr><tr><td>S3</td><td>Cindy</td></tr><tr><td>S4</td><td>Devinder</td></tr><tr><td>S5</td><td>Boris</td></tr></tbody></table>
Which in a browser becomes:
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
+-----------+----------+
| S2 | Boris |
+-----------+----------+
| S3 | Cindy |
+-----------+----------+
| S4 | Devinder |
+-----------+----------+
| S5 | Boris |
+-----------+----------+
The heading of a relation can be retrieved via its ``heading`` method, which returns the attribute names as a Python_ set, e.g.
.. sourcecode:: pycon
>>> print IS_CALLED.heading()
set(['StudentId', 'Name'])
The Interpretation of a Relation
********************************
Given a relation such as the one denoted by IS_CALLED above, we should take the meaning of it to be as follows:
* The heading supplies the parameters for the **predicate**, e.g. StudentId and Name are the parameters for the IS_CALLED predicate.
* The tuple ``Tuple(StudentId='S3', Name='Cindy')`` is an *instantiation* of that predicate. It is a **proposition** where the argument values 'S3' and 'Cindy' are substituted for the parameters. This states that student S3 is called Cindy.
* Each tuple in the relation is a *true* instantiation.
* Any tuple not in the relation is a *false* instantiation.
Function-based Relations
~~~~~~~~~~~~~~~~~~~~~~~~
Instead of defining the value of a relation variable once when it is assigned, we can refer to a function to provide the relation. The function can then return different values at different times. One important kind of relation variable that refers to a function for its data is a virtual (or derived) relation variable. A **virtual** relation variable refers to a function that returns a relational expression. All other relational variables are **base** relation variables. To specify a virtual relation variable we first need to define a function to provide the data by returning a relational expression. For example (ignore the relational expression syntax for now, we'll cover the details of that later):
.. sourcecode:: pycon
>>> def vIS_CALLED_caps():
... return IS_CALLED.extend(['NameCaps'], lambda t: {'NameCaps': t.Name.upper()}).remove(['Name'])
Then pass the heading as a list of attribute names followed by the body as a function reference, e.g.
.. sourcecode:: pycon
>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"], vIS_CALLED_caps)
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1 | ANNE |
| S2 | BORIS |
| S3 | CINDY |
| S4 | DEVINDER |
| S5 | BORIS |
+-----------+----------+
Such virtual relation variables' values will then vary as the underlying base relation variables vary. These virtual relation variables are called views in SQL.
Relation-Valued Attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~
An attribute value can itself be a relation. Such attributes are known as relation-valued attributes or RVAs. There are a number of relational operators (actually macros) that use such nested relations. For example, ``GROUP``, which takes a relation and a set of attribute names together with a new attribute name and returns a relation with the set of attributes as a nested relation, 1 per unique value of the non-grouped attributes:
.. sourcecode:: pycon
>>> print GROUP(IS_CALLED, ['StudentId'], 'StudentIds')
+----------+---------------+
| Name | StudentIds |
+==========+===============+
| Anne | +-----------+ |
| | | StudentId | |
| | +===========+ |
| | | S1 | |
| | +-----------+ |
| Boris | +-----------+ |
| | | StudentId | |
| | +===========+ |
| | | S2 | |
| | | S5 | |
| | +-----------+ |
| Cindy | +-----------+ |
| | | StudentId | |
| | +===========+ |
| | | S3 | |
| | +-----------+ |
| Devinder | +-----------+ |
| | | StudentId | |
| | +===========+ |
| | | S4 | |
| | +-----------+ |
+----------+---------------+
Predefined Relations
~~~~~~~~~~~~~~~~~~~~
There are two interesting relations that are useful for defining some fundamental relational operators in Dee_. We introduce them here.
DUM
***
This is the relation that has no attributes and no tuples. It plays the role of False. It is difficult to display:
.. sourcecode:: pycon
>>> print DUM
+
|
+
+
>>> print DUM.renderHTML()
<table><thead></thead><tbody></tbody></table>
It is also called TABLE_DUM and FALSE.
DEE
***
This is the relation that has no attributes and a single tuple. It plays the role of True. It is difficult to display:
.. sourcecode:: pycon
>>> print DEE
+
|
+
|
+
>>> print DEE.renderHTML()
<table><thead></thead><tbody><tr></tr></tbody></table>
It is also called TABLE_DEE and TRUE.
Relation Constraints
~~~~~~~~~~~~~~~~~~~~
A Relation (function-based or not) can also take an extra parameter in its constructor to specify a set of constraints. This takes the form of a Python_ dictionary where each key gives the constraint name and each value is a pair of constraint-function, parameters. For example, to specify that the "StudentId" attribute is a candidate key for the above relation we could say:
.. sourcecode:: pycon
>>> IS_CALLED = Relation(["StudentId", "Name"],
... [('S1', 'Anne'),
... ('S2', 'Boris'),
... ('S3', 'Cindy'),
... ('S4', 'Devinder'),
... ('S5', 'Boris'),
... ],
... {'PK':(Key, ["StudentId"])}
... )
>>> print IS_CALLED
+-----------+----------+
| StudentId | Name |
+===========+----------+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
Here, ``Key`` is a pre-defined constraint type (actually a function wrapper that creates a function) that takes a list of attributes to enforce the constraint. A constraint function can return True or False and is called whenever the relation is assigned a new value. If no candidate key is specified for a relation, one is assumed comprising all the attributes in the relation (this is displayed in representations as ``{'PK':(Key, None)}``). As another example:
.. sourcecode:: pycon
>>> COURSE = Relation(["CourseId", "Title"],
... [('C1', 'Database'),
... ('C2', 'HCI'),
... ('C3', 'Op Systems'),
... ('C4', 'Programming'),
... ],
... {'PK':(Key, ["CourseId"])}
... )
Another pre-defined constraint (function wrapper) is ``ForeignKey``. It takes a relation name and a mapping of foreign key attributes to candidate key attributes as parameters, e.g.:
.. sourcecode:: pycon
>>> IS_ENROLLED_ON = Relation(["StudentId", "CourseId"],
... [('S1', 'C1'),
... ('S1', 'C2'),
... ('S2', 'C1'),
... ('S3', 'C3'),
... ('S4', 'C1'),
... ],
... {'FKS':(ForeignKey, ('IS_CALLED', {"StudentId":"StudentId"})),
... 'FKC':(ForeignKey, ('COURSE', {"CourseId":"CourseId"}))}
... )
Here, two foreign keys are declared to ensure referential integrity between this relation and the relations referred to by IS_CALLED and COURSE.
Lambda
------
In a number of places we need to pass expressions, e.g. restrictions (where clauses). Python_ has a built-in way of defining such expressions with anonymous functions using the ``lambda`` keyword. So an example restriction for the above IS_CALLED relation could be:
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.Name == 'Boris')
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S2 | Boris |
| S5 | Boris |
+-----------+-------+
In this example, the lambda expression is passed to the relation's ``where`` function and the expression introduces a range variable, ``t``, which will stand for each Tuple in the relation. The expression itself, the part after the colon, tests whether the Name attribute of each tuple is equal to 'Boris': if it is then the tuple is included in the result. Any Python_ expression can be passed this way. So here, complex boolean expressions including boolean operators and function calls can be built, e.g.
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.Name.startswith('B') and t.StudentId.endswith('5'))
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S5 | Boris |
+-----------+-------+
>>> print IS_CALLED.where(lambda t: 'A' < t.Name[0] < 'D')
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S2 | Boris |
| S3 | Cindy |
| S5 | Boris |
+-----------+-------+
>>> print IS_CALLED.where(lambda t: t["Name"].startswith('B'))
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S2 | Boris |
| S5 | Boris |
+-----------+-------+
Of course, simple boolean expressions can also be used, e.g.
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: True)
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
>>> print IS_CALLED.where(lambda t: False)
+-----------+------+
| StudentId | Name |
+===========+======+
+-----------+------+
It's perhaps worth noting that the where function is really just shorthand for a natural join. Take the first example:
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.Name == 'Boris')
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S2 | Boris |
| S5 | Boris |
+-----------+-------+
This relational calculus based where clause can be rephrased using the relational algebra's ``AND`` operator (in this case acting as the natural join):
.. sourcecode:: pycon
>>> print IS_CALLED & Relation(["Name"], [('Boris',)])
+-----------+-------+
| StudentId | Name |
+===========+=======+
| S2 | Boris |
| S5 | Boris |
+-----------+-------+
Many of the relational methods provided are in fact macros implemented using only a few fundamental relational operators, such as ``AND``.
Another place lambda expressions can be used is when defining virtual relation variables. For example the earlier example:
.. sourcecode:: pycon
>>> def vIS_CALLED_caps():
... return IS_CALLED.extend(['NameCaps'], lambda t: {'NameCaps': t.Name.upper()}).remove(['Name'])
>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"], vIS_CALLED_caps)
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1 | ANNE |
| S2 | BORIS |
| S3 | CINDY |
| S4 | DEVINDER |
| S5 | BORIS |
+-----------+----------+
Could be re-coded using lambda in a more concise way as:
.. sourcecode:: pycon
>>> IS_CALLED_caps = Relation(["StudentId", "NameCaps"],
... lambda: IS_CALLED.extend(["NameCaps"], lambda t: {
... "NameCaps": t.Name.upper()}).remove(["Name"]))
>>> print IS_CALLED_caps
+-----------+----------+
| StudentId | NameCaps |
+===========+==========+
| S1 | ANNE |
| S2 | BORIS |
| S3 | CINDY |
| S4 | DEVINDER |
| S5 | BORIS |
+-----------+----------+
Lambda expressions can also be used as general constraints. On relations, another pre-defined constraint is ``Constraint``. This takes a function that must evaluate to True for the constraint to hold, e.g.:
.. sourcecode:: pycon
>>> EXAM_MARK = Relation(["StudentId", "CourseId", "Mark"],
... [('S1', 'C1', 85),
... ('S1', 'C2', 49),
... ('S2', 'C1', 49),
... ('S3', 'C3', 66),
... ('S4', 'C1', 93),
... ],
... {'PK':(Key, ["StudentId", "CourseId"]),
... 'MarkRange': (Constraint, lambda r: ALL(r, lambda t: 0 <= t.Mark <= 100))}
... )
Here, the 'MarkRange' Constraint uses the ``ALL`` relational operator (discussed below) to ensure that all Marks in this relation are between 0 and 100. Note the Constraint works at the relation level and its range variable is ``r`` in the example. Useful operators at this level are ``ALL``, ``ANY``, ``IS_EMPTY``, and the relational comparison operators discussed below, because they all take relations and return a boolean result.
Relations to Tuples
-------------------
Here are some conversion functions to map between relations and tuples:
fromTuple
~~~~~~~~~
This static method returns a relation from a tuple:
.. sourcecode:: pycon
>>> r1 = Relation.fromTuple({'CourseId':'C1', 'Title':'Database'})
>>> print r1
+----------+----------+
| CourseId | Title |
+==========+==========+
| C1 | Database |
+----------+----------+
It can also take an extra parameter to specify a set of constraints:
.. sourcecode:: pycon
>>> r1 = Relation.fromTuple({'CourseId':'C1', 'Title':'Database'}, {'PK':(Key, ['CourseId'])})
>>> print r1
+----------+----------+
| CourseId | Title |
+==========+----------+
| C1 | Database |
+----------+----------+
toTuple
~~~~~~~
This can apply only to a single-tuple relation and returns a tuple from that relation:
.. sourcecode:: pycon
>>> t1 = r1.toTuple()
>>> print t1
Tuple(CourseId='C1', Title='Database')
>>> print t1.Title
Database
fromTupleList
~~~~~~~~~~~~~
This static method returns a relation from a list of tuples:
.. sourcecode:: pycon
>>> r2 = Relation.fromTupleList([{'CourseId':'C1', 'Title':'Database'},
... {'CourseId':'C4', 'Title':'Programming'},
... {'CourseId':'C3', 'Title':'Op Systems'},
... {'CourseId':'C2', 'Title':'HCI'}])
>>> print r2
+----------+-------------+
| CourseId | Title |
+==========+=============+
| C1 | Database |
| C4 | Programming |
| C3 | Op Systems |
| C2 | HCI |
+----------+-------------+
It can also take an extra parameter to specify a set of constraints:
.. sourcecode:: pycon
>>> r2 = Relation.fromTupleList([{'CourseId':'C1', 'Title':'Database'},
... {'CourseId':'C4', 'Title':'Programming'},
... {'CourseId':'C3', 'Title':'Op Systems'},
... {'CourseId':'C2', 'Title':'HCI'}],
... {'PK':(Key, ['CourseId'])})
>>> print r2
+----------+-------------+
| CourseId | Title |
+==========+-------------+
| C1 | Database |
| C4 | Programming |
| C3 | Op Systems |
| C2 | HCI |
+----------+-------------+
toTupleList
~~~~~~~~~~~
This returns a list of tuples from the relation. Since relations are sets they can have no order, so to iterate through all the tuples in a relation you must use this method to first extract a list of tuples from the relation.
.. sourcecode:: pycon
>>> ts = r2.toTupleList()
>>> print ts
[Tuple(CourseId='C1', Title='Database'), Tuple(CourseId='C4', Title='Programming'), Tuple(CourseId='C3', Title='Op Systems'), Tuple(CourseId='C2', Title='HCI')]
This list can then be iterated over in the usual ways, e.g:
.. sourcecode:: pycon
>>> for t in ts:
... print t.Title
Database
Programming
Op Systems
HCI
>>> print [t.Title for t in ts if t.CourseId=='C4']
['Programming']
>>> for t in reversed(ts):
... print t.Title
HCI
Op Systems
Programming
Database
>>> print len(ts)
4
>>> print ts[0]
Tuple(CourseId='C1', Title='Database')
>>> print ts[-1]
Tuple(CourseId='C2', Title='HCI')
This is also the way to access the tuples in a pre-defined order. The ``toTupleList`` method can take an extra parameter to define a sort order. The sort parameter is a pair ``(ascending, attribute-list)`` where ``ascending`` is a boolean flag to indicate whether to sort in ascending order or not, and the ``attribute-list`` specifies the attributes to sort on.
.. sourcecode:: pycon
>>> tss = r2.toTupleList((True, ['Title']))
>>> print [t.Title for t in tss]
['Database', 'HCI', 'Op Systems', 'Programming']
>>> tss = r2.toTupleList((False, ['CourseId']))
>>> print [t.CourseId for t in tss]
['C4', 'C3', 'C2', 'C1']
The ``renderToHTML`` method, mentioned earlier, is built upon the ``toTupleList`` method and also allows this sort parameter, e.g:
.. sourcecode:: pycon
>>> print r2.renderHTML(sort=(True, ['Title']))
<table><thead><th><em>Courseid</em></th><th>Title</th></thead><tbody><tr><td>C1</td><td>Database</td></tr><tr><td>C2</td><td>HCI</td></tr><tr><td>C3</td><td>Op Systems</td></tr><tr><td>C4</td><td>Programming</td></tr></tbody></table>
Which in a browser becomes:
+----------+-------------+
| CourseId | Title |
+==========+=============+
| C1 | Database |
+----------+-------------+
| C2 | HCI |
+----------+-------------+
| C3 | Op Systems |
+----------+-------------+
| C4 | Programming |
+----------+-------------+
Relational Comparisons
----------------------
A number of boolean operators are available to compare the values of two relations. These are all implemented with the obvious overloaded Python_ comparisons.
Equality (==)
~~~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED == Relation(["StudentId", "Name"],
... [('S1', 'Anne'),
... ('S2', 'Boris'),
... ('S3', 'Cindy'),
... ('S4', 'Devinder'),
... ('S5', 'Boris'),
... ])
True
A useful shorthand for testing equality against an empty relation is to use the ``IS_EMPTY`` function:
.. sourcecode:: pycon
>>> print IS_EMPTY(IS_CALLED.where(lambda t: t.StudentId=='S99'))
True
>>> print not IS_EMPTY(IS_CALLED)
True
Inequality (!=, not ... ==)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED != COURSE
True
>>> print not IS_CALLED == COURSE
True
Proper Subset (<)
~~~~~~~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') < IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId.startswith('S')) < IS_CALLED
False
Subset (<=)
~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') <= IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId.startswith('S')) <= IS_CALLED
True
>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') <= IS_CALLED.where(lambda t: t.StudentId.startswith('S')) <= IS_CALLED
True
Proper Superset (>)
~~~~~~~~~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED > IS_CALLED.where(lambda t: t.StudentId=='S3')
True
>>> print IS_CALLED > IS_CALLED.where(lambda t: t.StudentId.startswith('S'))
False
Superset (>=)
~~~~~~~~~~~~~
.. sourcecode:: pycon
>>> print IS_CALLED >= IS_CALLED.where(lambda t: t.StudentId=='S3')
True
>>> print IS_CALLED >= IS_CALLED.where(lambda t: t.StudentId.startswith('S'))
True
Membership (in)
~~~~~~~~~~~~~~~
This is effectively the same as the subset comparison:
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.StudentId=='S3') in IS_CALLED
True
The membership operator can also be passed a tuple:
.. sourcecode:: pycon
>>> print Tuple(StudentId='S3', Name='Cindy') in IS_CALLED
True
>>> print Tuple(StudentId='S3', Name='Bob') in IS_CALLED
False
>>> print Tuple(StudentId='S3', Name='Cindy') not in IS_CALLED
False
>>> print Tuple(StudentId='S3', Name='Bob') not in IS_CALLED
True
Relational Operators
--------------------
We use a small core of relational operators to deliver a large number of operations. For example, we use ``&`` (relational AND) to provide natural join, intersection and Cartesian product, and we use it as the basis for implementing restriction and extension. A number of other operators are defined as macros on top of the core ones, e.g. ``GROUP``, and this number can easily be increased. The ideas behind this approach can be found in `The Third Manifesto`_ chapter 5.
One of the powerful uses of ``&`` is the natural join. This joins relations together on their commonly named attributes. To make the most of this, without having to rename attributes before each join, use the same name for the same attributes across relations, e.g. if a key on one relation is named "product_code" then use that same name in all other relations in case they need to be joined. Naming it "code" on the product relation and "product_code" on other relations would require the rename operator to be used before doing a natural join (not to mention making the two attributes appear to be different things).
The relational operators are defined as Python_ functions taking, and usually returning, relations. Many of the common ones are also defined as methods and operators on the Relation class.
Some basic operations on a relation now presented.
Projection (project, remove)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is so called because a relation can be thought of as representing a point in n-dimensional space (where n is the number of attributes) and just selecting a few of them is akin to projecting that point onto the chosen axes.
Note once again that since a relation body is a set of tuples, there are no duplicate tuples.
.. sourcecode:: pycon
>>> print IS_CALLED.project(['Name'])
+----------+
| Name |
+==========+
| Anne |
| Boris |
| Cindy |
| Devinder |
+----------+
>>> print IS_CALLED(['Name'])
+----------+
| Name |
+==========+
| Anne |
| Boris |
| Cindy |
| Devinder |
+----------+
>>> print IS_CALLED.remove(['Name'])
+-----------+
| StudentId |
+===========+
| S1 |
| S2 |
| S3 |
| S4 |
| S5 |
+-----------+
>>> print IS_CALLED.remove(['Name', 'StudentId']) == IS_CALLED.project([]) == IS_CALLED([]) == DEE
True
Rename (rename)
~~~~~~~~~~~~~~~
This is crucial to our implementation since attributes with the same name are considered to represent the same thing. The mapping of old to new attribute name(s) is given as a Python_ dictionary (or indeed a Tuple would also do).
.. sourcecode:: pycon
>>> print IS_CALLED.rename({'Name':'NewName'})
+-----------+----------+
| StudentId | NewName |
+===========+----------+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-----------+----------+
>>> print IS_CALLED.rename({'StudentId':'NewId', 'Name':'NewName'})
+-------+----------+
| NewId | NewName |
+=======+----------+
| S1 | Anne |
| S2 | Boris |
| S3 | Cindy |
| S4 | Devinder |
| S5 | Boris |
+-------+----------+
Restriction (where)
~~~~~~~~~~~~~~~~~~~
This is also known as relational selection, but that can be confusing because of the SELECT in SQL which is actually for projection.
.. sourcecode:: pycon
>>> print IS_CALLED.where(lambda t: t.StudentId=='S4')
+-----------+----------+
| StudentId | Name |
+===========+==========+
| S4 | Devinder |
+-----------+----------+
Natural Join, Times, Intersection (&)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you think about it, these are all the same thing - it just depends on whether the relations have some, none, or all of their attributes in common. We implement them all using the AND relational operator using the Python_ ``&``.
Note that since a relation heading is a set of attributes, there are no duplicate attributes.
Natural Join - Some attributes in common
****************************************
.. sourcecode:: pycon
>>> print IS_CALLED & IS_ENROLLED_ON
+----------+-----------+----------+
| CourseId | StudentId | Name |
+==========+===========+==========+
| C1 | S1 | Anne |
| C2 | S1 | Anne |
| C1 | S2 | Boris |
| C3 | S3 | Cindy |
| C1 | S4 | Devinder |
+----------+-----------+----------+
Times (Cartesian Join) - No attributes in common
************************************************
Beware: this kind of join can be very large and is almost always meaningless.
.. sourcecode:: pycon
>>> print IS_CALLED & COURSE
+----------+-----------+----------+-------------+
| CourseId | StudentId | Name | Title |
+==========+===========+==========+=============+
| C1 | S1 | Anne | Database |
| C1 | S2 | Boris | Database |
| C1 | S3 | Cindy | Database |
| C1 | S4 | Devinder | Database |
| C1 | S5 | Boris | Database |
| C2 | S1 | Anne | HCI |
| C2 | S2 | Boris | HCI |
| C2 | S3 | Cindy | HCI |
| C2 | S4 | Devinder | HCI |
| C2 | S5 | Boris | HCI |
| C3 | S1 | Anne | Op Systems |
| C3 | S2 | Boris | Op Systems |
| C3 | S3 | Cindy | Op Systems |
| C3 | S4 | Devinder | Op Systems |
| C3 | S5 | Boris | Op Systems |
| C4 | S1 | Anne | Programming |
| C4 | S2 | Boris | Programming |