Exact timestamp for same user with different subjects #201

avis1234 · 2015-06-17T08:59:52Z

Integrating with galaxy_zoo stream. Some events arrive with same user_id, same created_at and different subjects. Logging this issue per our discussion with your team.

chrissnyder · 2015-06-17T09:22:14Z

Are the annotations within the classifications the same?

willettk · 2015-06-17T09:46:19Z

According to @parrish, no. Here are the annotations for three classifications with same timestamp and user, but different subjects and annotations.

[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-1"}, {"sloan_singleband-1"=>"a-1"}, {"sloan_singleband-2"=>"a-1"}, {"sloan_singleband-3"=>"a-1"}, {"sloan_singleband-4"=>"a-3"}, {"sloan_singleband-5"=>"a-1"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]

parrish · 2015-06-17T15:10:39Z

Unfortunately, this is unavoidable. When the API receives a classification, it timestamps it immediately. The timestamps you're seeing in the data are set when the classification is created.

Some common scenarios that cause this:

A mobile user, or a user on flaky network connection (very common)

They begin classifying
Their network connection drops out
They continue classifying
When they reconnect their classifications finish sending to the API simultaneously

Or in times of unusually high traffic (less common)

The web server receives classifications faster than the requests can be processed
The requests queue up at the server in front of the API
The requests are pulled out of the queue and processed concurrently resulting in identical timestamps

The only way to approach this is to have the client timestamp the classifications before they are sent. The caveat here is that there are no guarantees on what the client system clock is set to.

I suppose you could try to calculate a client local time offset by comparing it to a response from the server and adjusting for network latency, but that's pretty far from reliable.

In a nutshell, you could figure out the order that requests are sent in, but not the actual time the request is sent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exact timestamp for same user with different subjects #201

Exact timestamp for same user with different subjects #201

avis1234 commented Jun 17, 2015

chrissnyder commented Jun 17, 2015

willettk commented Jun 17, 2015

parrish commented Jun 17, 2015

Exact timestamp for same user with different subjects #201

Exact timestamp for same user with different subjects #201

Comments

avis1234 commented Jun 17, 2015

chrissnyder commented Jun 17, 2015

willettk commented Jun 17, 2015

parrish commented Jun 17, 2015