How does Tick determine what is a "useful" time series? #2

durple · 2014-11-04T17:11:26Z

So we could go one of the three ways here:

Create a time series for everything if a stream has attributes A,B,C we create a time series for A,B, AB, AC, BC, ABC.
- Pros: We have a time series for everything imaginable.
- Cons: We have a time series for everything imaginable and also not useful leading to a wasteland of time series data e.g. a time series of unique identifiers that appear only once or identifiers that are constantly changing.
Have a user pick and choose by some mechanism. So if I have users, locations and event_id. I can pick users over time and users over time broken by locations.
- Pros: User gets just the data he/she wants and it can be made available
- Cons: It sort of defeats the purpose of having tick, there is no experimentation in this problem.
Lastly, have tick determine what is "useful" using some measure of the volume, dimensions of a stream and cardinality of the attributes themselves. The method of determination can be tweaked in various ways and experimented with. It may or may not always yield the right time series but can possibly be optimized over time to give better results.
- Pros: less waste. No user selection
- Cons: We don't know what the hell we are talking about here.

mikedewar · 2014-11-04T17:13:00Z

think this has different answers depending on the dimensionality of the series..

mikedewar · 2014-11-04T17:15:07Z

in 1D (like views on a page) you could make a case that volume is a good indicator of useful, or maybe variance? in >1D I bet covariance would be a good starting point.

A useless time series then is one that is always zero, or more generally alway the same.

mikedewar · 2014-11-04T17:18:59Z

hey also I bet there is a proper answer to this in terms of information content. Like a useful timeseries is one that is hard to compress, has low entropy etc. There's a lovely green book on my desk by Mackay that has opinions...

nikhan · 2014-11-04T17:55:35Z

I am more curious as to how this makes sense for a user

Whatever determination you use, it means that there will be a result for some queries and no result for others. And neither of those results would necessarily mean "because there was no data"

which is confusing to me.

durple · 2014-11-04T18:00:52Z

That is a very fat book!

durple · 2014-11-04T18:21:27Z

Whatever determination you use, it means that there will be a result for some queries and no result for others.

This is quite implementation specific, I think. We could implement tick such that a user knows what time series are being made available once it starts listening to the stream.

But you are right, if I have A, B & C and Tick determined that A, A & B are the only useful time series but the user was interested in B & C. I don't know how to handle that. It almost becomes a back to the drawing board problem to solve.

mikedewar · 2014-11-04T18:32:52Z

What about thinking of it more as a compression problem? If there is no
information in the series given the other time series then you should be
able to recreate the series using other series at query time...

Alternatively, a "no information" response to a query is an interesting
thing for a db to respond with...

M
On Nov 4, 2014 1:21 PM, "Deep Kapadia" [email protected] wrote:

Whatever determination you use, it means that there will be a result for
some queries and no result for others.

This is quite implementation specific, I think. We could implement tick
such that a user knows what time series are being made available once it
starts listening to the stream.

But you are right, if I have A, B & C and Tick determined that A, A & B
are the only useful time series but the user was interested in B & C. I
don't know how to handle that. It almost becomes a back to the drawing
board problem to solve.

—
Reply to this email directly or view it on GitHub
#2 (comment).

durple · 2014-11-04T18:36:04Z

Still wrapping my head around thinking of it as a compression problem...just grabbed the green book

Alternatively, a "no information" response to a query is an interesting
thing for a db to respond with...

But is it useful if I am looking for something very specific?

nikhan · 2014-11-04T18:39:07Z

Alternatively, a "no information" response to a query is an interesting
thing for a db to respond with...

only if it can be explained simply

nikhan · 2014-11-04T18:40:21Z

If you have timeseries for each key, couldn't you create what A&B would be? why do you need a time series for groups?

nikhan · 2014-11-04T18:44:46Z

Oh right, intersection vs exclusive. oh well

nikhan · 2014-11-04T18:47:56Z

Can I have table "key" with row "co occurrence" by time?

durple · 2014-11-04T18:55:09Z

If you have timeseries for each key, couldn't you create what A&B would be? why do you need a time series for groups?

No. Consider for example the following stream:

{user: Deep, location: NYC, ts:1}
{user: Deep, location: NJ, ts:1}
{user: Nik, location: NYC, ts: 1}
{user: Nik location: SFO: ts 1}
{user Mike, location: NYC, ts:1}
{user Deep, location: NYC, ts:1}

##Time series:

user
Deep ->(ts:1, count:3)
Nik ->(ts:1, count:2)
Mike->(ts:1, count1)

location 
NYC -> (ts:1,count: 4)
NJ -> (ts:1, count: 1)
SFO ->(ts:1, count:1)

And if my question is give me all the times Deep was in NYC, I can't decipher it from the above time series. I can however decipher it from

user,location
Deep,NYC->(ts:1,count:2)
Deep,NJ->(ts:1,count:1)
Nik,NYC->(ts:1,count:1)
Nik,SFO->(ts:1,count:1)
Mike,NYC->(ts:1,count:1)

durple · 2014-11-04T18:56:21Z

Oh right, intersection vs exclusive. oh well

Great! I spent 5 minutes building time series by hand from a stream of imaginary JSON.

nikhan · 2014-11-04T18:59:57Z

sorry 😧

durple · 2014-11-04T19:06:52Z

Can I have table "key" with row "co occurrence" by time?

Not sure if I understand. Isn't that the same as having more than one column as a primary key? If so, it becomes the same as what I mentioned in the example

nikhan · 2014-11-04T19:11:10Z

what is wrong with that?

mikedewar · 2014-11-04T19:14:32Z

amen re: explaining no data

durple added the question label Nov 4, 2014

durple changed the title ~~How does Tick determine what is a "useful" time series~~ How does Tick determine what is a "useful" time series? Nov 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does Tick determine what is a "useful" time series? #2

How does Tick determine what is a "useful" time series? #2

durple commented Nov 4, 2014

mikedewar commented Nov 4, 2014

mikedewar commented Nov 4, 2014

mikedewar commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

durple commented Nov 4, 2014

mikedewar commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

mikedewar commented Nov 4, 2014

How does Tick determine what is a "useful" time series? #2

How does Tick determine what is a "useful" time series? #2

Comments

durple commented Nov 4, 2014

mikedewar commented Nov 4, 2014

mikedewar commented Nov 4, 2014

mikedewar commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

durple commented Nov 4, 2014

mikedewar commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

durple commented Nov 4, 2014

nikhan commented Nov 4, 2014

mikedewar commented Nov 4, 2014