-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you tell me why I replaced the value with 1? #176
Comments
It is a primitive noise cleanup. Normally, the variance is in range > 0 and < 1. Sometimes, there is a super activity (marketing advertisements, "oh! I found a nice library" articles and so forth) and we need to clean it up somehow. The way I see it is just replacing with the normal maximum value, which is |
I think the variance be highly surpassed in many cases. Could you tell
about. THE RANGE OF variance?
I think that is mean range in the logic.
2016년 12월 6일 (화) 오전 6:58, Kanstantsin Kamkou <[email protected]>님이
작성:
… It is a primitive noise cleanup. Normally, the variance is in range > 0
and < 1. Sometimes, there is a super activity (marketing advertisements,
"oh! I found a nice library" articles and so forth) and we need to clean it
up somehow. The way I see it is just replacing with the normal maximum
value, which is 1. If you have any opinion how can we improve this logic,
you're welcome.
More logic is here
<https://github.com/kkamkou/gitmostwanted.com/wiki/Repository-statuses>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJje6Y6iCIVoQ9MbYwNjVkWkNEhBTF4Fks5rFIkdgaJpZM4LEIjQ>
.
|
You can run this query with any repository for 28 days. Then, just call variance. |
Return mean([1 if variance(chunk) >= 1000 else mean(chunk) for chunk in
chunks])
But this logic tells when the variance is high, it was replaced 1 and if
not , it is just mean.
Is it right?
2016년 12월 6일 (화) 오전 8:53, Kanstantsin Kamkou <[email protected]>님이
작성:
… You can run this query
<https://github.com/kkamkou/gitmostwanted.com/blob/master/gitmostwanted/tasks/repo_stars.py#L17>
with any repository for 28 days. Then, just call variance.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJje6RX-06xu8xbAWtx7-5ZhPEVHw2DMks5rFKQJgaJpZM4LEIjQ>
.
|
Yes, if the variance is abnormal, we'll replace it with 1. And keep as is otherwise. |
Then, you mean that the variance do not have '0 ~ 1 range'. What is the
above the variance in the 0< range >1?
I want to understand your logic. PleaSe help me.
2016년 12월 6일 (화) 오전 9:11, Kanstantsin Kamkou <[email protected]>님이
작성:
… Yes, if the variance is abnormal, we'll replace it with 1. And keep as is
otherwise.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJje6V6GWqeC3wzW3ILJ4C78UKbST4-Uks5rFKgVgaJpZM4LEIjQ>
.
|
Lets assume we have a repo. every day we do have a number of stars or forks. For example: What we do next is splitting by |
Thank you for your kind explanation.
I understood that part.
I wonder why you made a logical leap.
Average data can not have a range because it is not a percentage in
that situation.
Could you tell why replace the mean with 1?
- youngsoo
2016-12-06 9:58 GMT+09:00 Kanstantsin Kamkou <[email protected]>:
… Lets assume we have a repo. every day we do have a number of stars or
forks. For example:
1: 3 2: 5 3: 0 4: 9999 5: 87 6: 15 7: 4 ... 28: 7
What we do next is splitting
<https://github.com/kkamkou/gitmostwanted.com/blob/master/gitmostwanted/tasks/repo_status.py#L18>
by 7. [3, 5, 0, 9999, 87, 15, 4], [...], [...], [...]. And checking each
of them (variance). If variance is huge, we assume that the mean value of
the whole set is equal to 1. At the end we're calculating the mean value
for all 4 mean values. [1, 5, 8, 10]. Primitive logic, and should be
improved here <#127>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJje6S219L_9hUwb8kE6ufCSOJbIbdJEks5rFLMvgaJpZM4LEIjQ>
.
|
What is important for me was to cleanup the noise. 1 is just not zero :) Therefore there a huge room to improve the logic. Maybe we could use |
Thank you for your reply. Have a good day.
|
gitmostwanted/tasks/repo_status.py:43
I can not understand this substitution.
Does it mean more activity?
Thank you for your kindness.
youngsoo
The text was updated successfully, but these errors were encountered: