-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathideas.txt
107 lines (76 loc) · 3.32 KB
/
ideas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
recording and querying sensor data
- this can include sampling CPU, thermometer, disk usage, car speed, etc over an interval
(usually would be fixed interval per sensor, but doesn't matter)
- search by time period per source
- search for samples that were between X and Y for a given sensor
sensor data roll ups
- aggregate several "sources" for some roll ups
general query service implementation (w/o solr)
- cassandra can do some stuff, but full text search, no
- really not a good fit for cassandra
general object persistence
- ok for cassandra except for updates! means read-lock-write (which cassandra doesn't support at the moment)
blog site
- "create-user" - user (email, password, name, etc - can add properties anytime)
- "crate-post" - blog posts
- "create-comment" - comments
- "show-post, show-comment" - votes per blog post, per comment
- "show-user, show-user-comments" - activity by user for posts and comments
- give me all comments sorted: by vote, by timestamp
- "not done" - activity for entire site (logging)
working session
- could have them evolve user/post objects and see affects on data model
done - ColumnFamily: users
--------------------
key = email
required columns: email, password, name
comparator: bytes (no sorting needed)
done - ColumnFamily: posts
-------------------
key = TimeUUID
required columns: user-email, timestamp, title, text
comparator: bytes (no sorting needed)
done - ColumnFamily: comments
----------------------
key = TimeUUID
required columns: post-id, user-email, user-name, timestamp, text
comparator: bytes (no sorting needed)
done - ColumnFamily: user-posts
------------------------
key = email
required columns: post-uuid
comparator: TimeUUID (sort by create timestamp)
done - ColumnFamily: post-comments
-------------------
key = TimeUUID (post-uuid)
required columns: TimeUUID
value: UUID of comment
comparator: TimeUUID (sort by create timestamp)
done - ColumnFamily: user-comments
---------------------------
key = email
required columns: comment-uuid
comparator: TimeUUID (sort by create timestamp)
done - ColumnFamily (counter): votes
-----------------------------
key = UUID
required columns: votes
comparator: bytes (no sorting needed)
How to sort comments by vote
----------------------------
- record vote using distributed counters CF, votes
- record the vote in CF, user_votes, so a user can only vote once per post or comment
- record in CF, comment_vote_ledger, on 5 minute keys: col name = PostUUID, col value = empty
process awakes every N seconds
- read last time performed "post vote sorting" from CF, system-data.
- read rows from CF, comment_vote_ledger, since "last time" until present. each row determines what Posts had comment
votes for that time interval, which means their votes must be sorted.
- for each row read the column names, which are Post UUIDs. Read all comment votes for Post, sort the comments by
vote total, then write to post_comments_sorted_by_vote using: key = PostUUID, col name = votes:CommentUUID (delete entire row first)
- writes out "last time"
How to sort for Top N posts over last M days
--------------------------------------------
- each day at midnight read from CF, posts_by_time, 30 days worth of post UUIDs
- for each Post UUID:
- lookup its current votes from CF, votes
- sort the posts by votes and write to CF, posts_sorted_by_vote