You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Together with data scarcity, the other aspect that we should never underestimate when we speak about data is whether they are representative of the phenomenon we aim to study. In a recent article by Anna Rogers, the author considers the following argument: “the size of the data is so large that, in fact, our training sets are not a sample at all, they are the entire data universe”. Rogers replies to it by saying that this argument would stand if the “data universe\" that we use for training for instance a speech regognition system was the same as “the totality of human speech/writing\". It is not, and will hopefully never be, because collecting all speech is problematic for ethical, legal, and practical reasons. Anything less than that is a sample. Given the existing social structures, no matter how big that sample is, it is not representative due to (at least) unequal access to technology, unequal possibility to defend one’s privacy and copyright, and limited access to the huge volumes of speech produced in the “walled garden\" platforms like Facebook. \n",
The use of Roger's name does not make it clear that they are being referenced formally.
References are difficult to link in text, this could become confusing for readers. For example in module 1.
rds-course/coursebook/modules/m1/1.1-WhatIsDataScience.ipynb
Line 59 in 53c121a
The use of Roger's name does not make it clear that they are being referenced formally.
rds-course/coursebook/modules/m1/1.1-WhatIsDataScience.ipynb
Line 265 in 53c121a
I can see two possible solutions to making references more explicit :
The text was updated successfully, but these errors were encountered: