Skip to content
This repository has been archived by the owner on Dec 20, 2021. It is now read-only.

Stats broken since January 2016 #22

Open
Themanwithoutaplan opened this issue Mar 30, 2016 · 43 comments
Open

Stats broken since January 2016 #22

Themanwithoutaplan opened this issue Mar 30, 2016 · 43 comments

Comments

@Themanwithoutaplan
Copy link

This is a minor niggle but it looks like vanity is not getting any updated statistics since about January. So for example vanity openyxl is confidently telling me that the package has never been downloaded.

@aclark4life
Copy link
Owner

@Themanwithoutaplan Yep, I think this is a PyPI issue.

@SmokinCaterpillar
Copy link

Is there anything you can do about it? I like vanity. Outdated statistics, however, make it quite useless :-)

@aclark4life
Copy link
Owner

@SmokinCaterpillar I like it too! We need to ask @dstufft or someone from @pypa to help.

@Themanwithoutaplan
Copy link
Author

I think Donald is concentrating on getting Warehouse up to replacing PyPI. Should be more reliable once that's done.

@dstufft
Copy link

dstufft commented Apr 23, 2016

As part of Warehouse I've been working on a new stats pipeline that should both be way more robust and provide a lot more insight into downloads.

@aclark4life
Copy link
Owner

@Themanwithoutaplan @dstufft Any ETA on Warehouse? Might be worth fixing whatever annoyance has broken stats again at least once more to get us through…

@Themanwithoutaplan
Copy link
Author

I think Warehouse is pretty close to being ready. Nobody likes touching the PyPI code and, given that it's been broken since January, I don't think another few days or weeks really matter.

Warehouse has a much clearer (and better) code base that will hopefully make it easier to maintain and more reliable. And help to add features.

@aclark4life
Copy link
Owner

@Themanwithoutaplan Great! Nope, another few days or week don't really matter. Months on the other hand …

@ryukinix
Copy link

They was talking about disable the stats because is distorted (mirrors counts and so on). Anybody can explain to me what is the Warehouse?

ryukinix added a commit to ryukinix/mal that referenced this issue May 16, 2016
@aclark4life
Copy link
Owner

@ryukinix Ah, thanks for the cross ref. Warehouse is: https://github.com/pypa/warehouse

@ryukinix
Copy link

Oh, nothing, thanks you about that nice tool! Is a little sad doesn't works now, but is not your fault. xD

Warehouse looks interesting! We have some estimative when this will works in production? Would be nice have the vanity working again.

@aclark4life
Copy link
Owner

@ryukinix According to @Themanwithoutaplan "pretty close to being ready" … and we should only have to live with broken stats "another few days or weeks". Practically speaking though, since it's a (much appreciated) volunteer effort, I would be happy if it happened sometime in 2016, period.

@aclark4life
Copy link
Owner

https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html

@dstufft
Copy link

dstufft commented May 25, 2016

Just to be clear. PyPI isn't using this data yet but it will be.

Sent from my iPhone

On May 25, 2016, at 8:39 AM, Alex Clark [email protected] wrote:

https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@aclark4life
Copy link
Owner

@dstufft Yeah understood, thanks! Presumably some aggressive vanity user could start consuming it then add support to vanity :-)

@Themanwithoutaplan
Copy link
Author

whistles and looks at his shoes.

@aclark4life
Copy link
Owner

Is this fixed? I'm seeing stats again …

screenshot 2016-06-20 16 24 20

@yotammanor
Copy link

yotammanor commented Aug 10, 2016

Did you consider moving to using the BigQuery dataset, for the moment?

(As suggested here )

@aclark4life
Copy link
Owner

Yep, suggested above too. Updating vanity to use the BigQuery data set is possibly a way to get old "missing" data back.

@aclark4life
Copy link
Owner

Is it safe yet to remove the "stats broken" message from vanity? If so, I'll close this and make a new release.

@noxdafox
Copy link

It seems stats are broken again.

requests-2.12.1-py2.py3-none-any.whl    2016-11-16       624953
              requests-2.12.2.tar.gz    2016-11-30            0
requests-2.12.2-py2.py3-none-any.whl    2016-11-30            0
              requests-2.12.3.tar.gz    2016-12-01            0
requests-2.12.3-py2.py3-none-any.whl    2016-12-01            0

@aclark4life
Copy link
Owner

@noxdafox I think they've been broken since January, or at least not working consistently…

@dstufft
Copy link

dstufft commented Dec 15, 2016

Sorry, I've had a lot more higher priority items. I would suggest using the BigQuery database instead of the API, although that doesn't (and can't, since some of that data simply doesn't exist anymore) get a cumulative count of downloads past a certain date. Currently that date is early 2016, but once I am able to backfill data it will be past a Jan 2014 date.

@Themanwithoutaplan
Copy link
Author

@dstufft that would work for me. From a library developer's perspective I'm mainly interested in what's been happening recently: are people updating so I can kill old stuff?

@dstufft
Copy link

dstufft commented Dec 15, 2016

This may also be helpful: https://langui.sh/2016/12/09/data-driven-decisions/

@nschloe
Copy link

nschloe commented Dec 15, 2016

@dstufft I'm reading there:

Queries are charged against your account, but you get 1TB free per month and cached queries won't count against it.

Does this mean vanity will either have to ship with someone's personal credentials or ask the user to fill in their own credentials in a local config?

@dstufft
Copy link

dstufft commented Dec 15, 2016

@nschloe Yes.

@nschloe
Copy link

nschloe commented Dec 15, 2016

Sounds like this is end of easy-to-get stats on Python projects then. Too bad.

Is there a download stats section planned for warehouse?

@dstufft
Copy link

dstufft commented Dec 15, 2016

I don't believe possessing a Google account to be a significant barrier to entry to accessing statistics. It is certainly more of a barrier than completely unauthenticated, but not much IMO.

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

@nschloe
Copy link

nschloe commented Dec 15, 2016

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

Yes, that's what I meant; just a simple "download count in the last 30 days" or something along those lines. Something to brag about. 😉

@dstufft
Copy link

dstufft commented Dec 15, 2016

Yea something like that, though it is fairly low on my list of priorities since (A) it's non trivial to implement and (B) BigQuery is available.

@nschloe
Copy link

nschloe commented Jan 16, 2017

I'm getting fairly reasonable numbers out of vanity again. Has something been silently fixed?

@piem
Copy link

piem commented Jan 27, 2017

hi there,

it seems not everything was fixed:

aubio_vanity

at least one person downloaded aubio 0.4.4 (me :-) ), some time ago already.

cheers, piem

@MartinPyka
Copy link

is there any alterantive to vanity?

@aclark4life
Copy link
Owner

@MartinPyka Not that I know of…

@nschloe
Copy link

nschloe commented Apr 7, 2017

Again, I'm getting reasonable numbers for various projects. Has this been silently fixed?

@Themanwithoutaplan
Copy link
Author

Could be related to pypi having switched to Warehouse even though this is still not quite finished.

@aclark4life
Copy link
Owner

Going to try and tackle this one on Aug 5 at this event:

If anyone has any tips, please feel free to post them here (I know nothing about BigQuery going in.)

@aclark4life aclark4life changed the title Stats broken since January Stats broken since January 2016 Jul 18, 2017
@Themanwithoutaplan
Copy link
Author

Hi Alex, haven't worked with it myself but it's essentially a JSON API. httparchive is switching to it so you might be able to get some of an idea of how it works from that code, though it's all JS. One example is here http://jsfiddle.net/rviscomi/1r6dpctd/ if you look at the source.

I think the biggest problem will be whether you need to use credentials to access the data. If so you'll need to implement some kind of proxy somewhere. Based on the above example this may no longer be the case for public data sets. wget https://storage.googleapis.com/http-archive-beta.appspot.com/bytesJsTimeseries.json.

Best of luck!

@ofek
Copy link

ofek commented Aug 24, 2017

@MartinPyka This is what people are using now https://github.com/ofek/pypinfo if you still need an alternative

@aclark4life
Copy link
Owner

@ofek Nice! Good to know this project exists. (Although I do take some offense to your statement "this is what people are using now …" srsly?)

@ofek
Copy link

ofek commented Aug 24, 2017

@aclark4life Sorry about that, I meant no offense! It was regarding BigQuery usage, not download stats in general.

@aclark4life
Copy link
Owner

@ofek No prob! Just finished installing and testing pypinfo, very nice …

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests