Skip to content
This repository has been archived by the owner on Mar 25, 2019. It is now read-only.

Exact timestamp for same user with different subjects #201

Open
avis1234 opened this issue Jun 17, 2015 · 3 comments
Open

Exact timestamp for same user with different subjects #201

avis1234 opened this issue Jun 17, 2015 · 3 comments

Comments

@avis1234
Copy link

Integrating with galaxy_zoo stream. Some events arrive with same user_id, same created_at and different subjects. Logging this issue per our discussion with your team.

@chrissnyder
Copy link
Contributor

Are the annotations within the classifications the same?

@willettk
Copy link
Contributor

According to @parrish, no. Here are the annotations for three classifications with same timestamp and user, but different subjects and annotations.

[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-1"}, {"sloan_singleband-1"=>"a-1"}, {"sloan_singleband-2"=>"a-1"}, {"sloan_singleband-3"=>"a-1"}, {"sloan_singleband-4"=>"a-3"}, {"sloan_singleband-5"=>"a-1"}, {"sloan_singleband-11"=>"a-1"}]
[{"lang"=>"en"}, {"sloan_singleband-0"=>"a-2"}, {"sloan_singleband-11"=>"a-1"}]

@parrish
Copy link
Contributor

parrish commented Jun 17, 2015

Unfortunately, this is unavoidable. When the API receives a classification, it timestamps it immediately. The timestamps you're seeing in the data are set when the classification is created.

Some common scenarios that cause this:

A mobile user, or a user on flaky network connection (very common)

  • They begin classifying
  • Their network connection drops out
  • They continue classifying
  • When they reconnect their classifications finish sending to the API simultaneously

Or in times of unusually high traffic (less common)

  • The web server receives classifications faster than the requests can be processed
  • The requests queue up at the server in front of the API
  • The requests are pulled out of the queue and processed concurrently resulting in identical timestamps

The only way to approach this is to have the client timestamp the classifications before they are sent. The caveat here is that there are no guarantees on what the client system clock is set to.

I suppose you could try to calculate a client local time offset by comparing it to a response from the server and adjusting for network latency, but that's pretty far from reliable.

In a nutshell, you could figure out the order that requests are sent in, but not the actual time the request is sent.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants