Extended quotes corresponding to the themes

Some notes:

Quotes may be examined by expanding the folded sections.
Participant IDs have been redacted (set to PX) to protect their privacy.

Why do data scientists access the scientific literature?

Actively understanding disciplinary norms

understanding-discipline, 12

Sure i've been doing full time research for about two years and so often when i'm starting work in a problem ... i'm not sufficiently familiar with to work to really know what the typical approaches are, how is this evaluated, what kinds of approaches people are falling out of favor versus becoming more accepted by the community - PX

And then the second goal might be that when you are new to a field you might need to learn about some terminology of that field. So for example, right now I am exploring areas in RL which I am not aware of. So it is mostly to build my own vocabulary of how certain things are defined, what the different terminology are. ... And third to build the mental map of the whole field, where things exist. - PX

So when we get there was to get like familiarized with this like huge theoretical framework that exists understand how people were historically talking about it understand how people have since then changed how they talk about it, and today, how is it seen in the field. - PX

action-seek-dataset, 8
action-seek-metrics, 4

And finally, like once I have formulated like a problem statement that i'm working on figuring out what the standard practices in terms of evaluation. - PX

Again, what types of .. benchmark tests are people doing for protocols like the ones that i'm trying to propose ... how do I assess proposal in a way that is appropriate for my for my audience - PX

action-understand-problem, 11

And then just maybe get a crux of stuff like what's the label space for the data sets, in a normal production setting its neighbors space is really big. What are some data sets that are more tractable How does that affect the numbers .. - PX

So my rationale in detail is to understand what parts of the problem is addressed, what are their assumptions, and lets say if I find something which i can improve upon - PX

while I tried to find the flaws in that and the shortcomings of those - PX

action-seeking-similar-problems, 8

The first question that comes to mind for any researcher is what has been already done, are there similar problems which have already been tackled Or related problems whose corresponding methods might be used in my problem and i use it. - PX

So this [Related Problems] section eventually would be an important section for that context, I would I would want to see what, and this is, I wouldn't say You know, like it or hate it and probably most people love Wikipedia but for most generic enough problems or keywords there is this section and it's an incredibly powerful section. I think I would go through these, not through all of these maybe like this one i'm less interested in, but these are absolutely all relevant sections, so I wil follow up on it. - PX

Resources sought for active understanding (excluded from the paper)

seeking-survey-papers, 7

What I [try] to do is start with a survey paper first. So that I know which papers to focus on and then based on that go deeper individually or those papers. - PX

You know, maybe or a blog post, which is actually doing a survey of the different methods in meta learning. - PX

Mostly i like to hear talks of keynote speakers or tutorials. Actually talks often do a literature review of a certain field. They do a good job of telling you where it is. - PX

First stop I have found for a lot of theoretical work that I'm doing is Wikipedia, obviously you cant cite Wikipedia and you cant take its word for it, but it is a great resource. - PX

seeking-latest-papers, 5

Especially with a new papers if any related paper and which is not yet registered in Google scholar. Then that helps because in Google scholar it's difficult to find the most recent papers but Twitter when sometimes people upload. - PX

seeking-influential-papers, 3

Research everything, so you do keyword search and try to find papers that lead you to something like some paper that you can trust so. - PX

it's mostly trying to find the core papers that are very influential with regards to the topic and that's mostly through Google scholar where you see which ones have the most citations. - PX

I want to say i'm just unaware of this area in general, so. I want to find an older paper which people are excited to which, which has like a how the how the How the field like this particular sub area of domain generalization started. - PX

action-seek-older-paper, 5

There is one other thing about that discipline that made it challenging which was that that area was very active in like the 90s. And so, a lot of the papers that are relevant to our study where were older and so they're published in different venues, and they use different techniques and it was just some. It took it wasn't. The approaches that they use were not familiar to me. It made it much harder to skim papers. And things get to the essence of what. Is the contribution of that paper. - PX

Okay, but yeah I think this sort of challenge of like I think things are rediscovered in different circles waiting often things are also forgotten with. Like you're fine tables the 80s and the 90s. somewhat similar idea. What have you discovered in different contexts. and often it feels like we're doing the same kinds of like. we're doing the same thing again again. - PX

This particular papers on 2010 so if I need to understand something very basic. Which isn't., Only present in papers which are which came earlier. - PX

seeking-cross-domain-papers, 6

So, because this context doesn't really hasn't existed before there's actually like almost no existing research published research in my on my topic that it really exists on the Internet. And so i've had the new challenge of having to Look, for things that are like tangential to my texts - PX

And so it's very hard to sort of figure out whether it's been done before, or even sort of mapping some the same thing in one field to play this on any field us very specifically here Okay, but yeah I think this sort of challenge of like I think things are rediscovered in different circles waiting often things are also forgotten with.. - PX

seeking-demonstration-papers, 1

Passively following a discipline

action-updating-on-lit, 9

Literally reading from time to time to keep up with what's happening in the field. - PX

It is to help me understand the trends in my research areas or understand what other people are doing or are excited by. - PX

Where the community is going, or what people that I have previously followed the works of art up to right now. - PX

In the slack channel that I mentioned it's more open ended like I'm not working on the same thing. ... that sort of also gives me some sort of view into what everybody else is doing. - PX

You know relevant reading for my own research what I had in mind ... to an extent, where, at some point, I definitely thought I would in the future do research. [But] now I think that that's been on the on the back burner, it's not really something that's likely to materialize but i'm still interested in it. - PX

But then there is a alternate goal of exploration, and to have more food for thought. There are other projects which I am also exploring and I might be interested in it at some point. - PX

pp-subscribing-but-not-opening, 5

It would be nice to like be able to like keep up with people's work ... Maybe at least you open up 10% of this stuff instead of being like this is just junk mail - PX

yeah I don't heavily keep up with them, because it gets overwhelming, but I think from time to time, if I have the time yeah I go through [google scholar alerts] - PX

I could very easily you know miss a lot of interesting threads, I know I missed a lot of interesting threads, they just get buried. - PX

Brainstorming solutions

seeking-research-frontier, 10

making sure i'm not redoing stuff that's been done before that's like one major aspect figuring out like open questions and existing work that I could work on - PX

so something i'm also interested in is like how what kind of recent publications exist on this topic, because yeah there might stuff in the past happen that might be cool but chances are like More recent literature is always going to be more useful to me - PX

And sometimes it happens with me that some papers I am building my ideas upon are very old. Like 2002, 2003. So in that case i need to find papers which are more relevant and recent to see if people are even working on something like this now. - PX

And the hope is that while we are looking at the existing research we''ll find something which has not been experimented on before and we can use those experiments and that'll be the innovation component. - PX

extend-prior-work, 7
action-seeking-similar-methods, 13 (related to diff-to-paper)

You know people could be pointing to a better solution or people could be pointing to a different kind of solution - PX

Since then, like after you figure [the problem] out it's like I have an idea for what you could do better, and so, then it's seeing if others have done something similar before. - PX

One what has been done even remotely close to this, so accepting the differences, but understanding what the differences are. - PX

ensuring-novelty, 10

So sometimes it is just to see if someone has already done this. Like someone has exactly worked on this and has done a work like this. So its more to confirm that. Yeah so is it even a significant contribution, and to be honest can someone do it better. - PX

I see like what what is out there who are me what what our competitors doing we kind of know like you know certain certain players in the space in the document space - PX

Seeking solutions for application

method-finding, 16

So when i search for literature I am looking for existing methods which do what i am doing. What are other people doing to solve this. Is it even solved? Are there methods which already solve this problem? - PX

What are the general approaches that people have taken to solve a given particular task so for. ... That helps when we are trying to publish a paper ... So this forms what are the existing baseline methods to compare against to for quantitative research - PX

and almost always just just let scientific completeness you always try existing methods first before we jump into something new - PX

action-seek-method-context, 4

you've kind of figured out a class of models you're also looking at what our experiences other people have had ... and how do we transform this algorithm into our use case. - PX

so I went back to that for like I guess two reasons, I wanted to figure out under what circumstances are people even interested in like constraint decoding like what kind of problems. - PX

And then, for another The other thing I will look this how other people have applied it to see if other people have applied it to similar data or similar use cases to mine. - PX

reproducing-experiments, 2

Package an idea (excluded from the paper)

contextualize-own-work, 8

At the end, if am writing a paper, ive developed a method, i have run my experiments, i am at the end stage. Whatever you are proposing, you need to locate it in the existing landscape of science and scientific literature. So its a duty but its also how we do research. Its the core of how you advance scientific knowledge in a way. - PX

Is the situation where this research problem or research idea didn't come about as a result of previous research this came about because of a real world setting at my company, where a need was observed by engineers and users of the system And so I was You know sort of given this concept by researchers, or by by these people as a researcher to continue looking into it So in that context Now i'm looking at the existing literature to even see how do I connect this to the To what it really exists - PX

So those are the two modes of discussion. And sometimes if a paper is highly relevant to a project we are doing, more often with my lab mates. Lets say I am doing a project with someone, i schedule a meeting to discuss that paper in detail so that we understand different aspects of it and we can discuss what we are doing differently so that we can position our work better - PX

recontextualize-own-work, 4 (strong overlaps to contextualize-own-work)
lit-review-due-diligence, 6 (strong overlaps to contextualize-own-work)
action-seek-method-applications, 4

there's various times during the writing process where the there will be that time for the literature review where ... who need to find current existing applications of your work ... and often it does paint a complete picture of the problem that we are trying to solve so it's a pretty nice exchange. - PX

So for that I was reading papers focusing not only on developing new methods but how to apply those methods on real wold data. So I usually start with focusing on the main problem I am trying to solve, so under generalization formalizing it better and understanding under what settings it works. - PX

How do data scientists access the scientific literature?

Data scientists seeking the literature

action-web-search, 14

So I usually search by a very broad term like, ... i'd like to see if it's returning anything reasonable. - PX

(mostly mentions of using keyword search)

action-site-search, 8

(mentions of keyword search once again - hard to distinguish from action-web-search)

action-reformulate-query, 9

(think aloud descriptions of reformulating queries - adding or removing terms etc.)

action-paper-guided-discovery, 15

Once so once I get like two or three different papers that seem relevant to the work that I might be doing or want to know about I kind of stopped there and try to see what are the different papers that those papers are referring to, like some back context - PX

I try to remember like, especially when i'm working on specific projects, I tried to make notes of papers, whose related work, I found was pretty extensive. - PX

Once I find a paper relevant to my interests, I will I will skim it, but even if it's even if it's not actually useful to my own research like I really don't think this has almost any application, but it uses those keywords then I'll skim the intro and related work sections to try to find Like statements that relate to what i'm doing and then look at the citations associated to those - PX

I start closing in on my search around the time when I realized that every people i'm reading is taking about other papers I have also read, or at least come across like more in kind of like important papers in the area. - PX

action-author-guided-discovery, 13

Sometimes you find something which is very interesting or relevant so one thing i do ... is i go and check who the authors are and i go to their personal websites or their google scholar profiles to see what else they have published because it may be relevant as well. - PX

pp-keyword-mismatch, 5 (also pp-unfocussed-search)

the choice of keywords is actually a pretty big issue ... because the keywords can lead you to wildly different places. ... I had to generalize my problem statement ... and look for statistical methods which merge or combine different estimators. So it just took me very long to find the right combination of keywords to find an existing method in the literature. - PX

I had an idea in my head, but I was not sure how to map this into like normative terms used by communities or even necessarily what communities are interested in this ... And in like ultimately it just took like trial and error like finding some papers and then coming back to it later and like over several weeks, I did the search different times and eventually kind of started to find things that find terms that actually matched. - PX

We thought about [problem] for a while and we ... got a bunch of results from that. ... A couple of weeks later, my advisor said he mentioned that he found some other paper ... he said that this problem actually has a name. So we found that there is there was some work done back in the 1980s and 90s with a completely different name for a very specific instance of our problem. So we were working on something ... we thought that no one actually had asked this question, we turned out to be wrong, and a small number of people were like literally writing 60 page papers on it. - PX

The non standard usage of terms is can be very much from the same discipline. In math the same term is used in very different contexts in different places, such as the term ``normal'' ... but those are ... things which you are on the lookout for so your alarm bells start ringing about Oh, I have to make sure that i'm not misreading this - PX (while reading)

The biggest challenge is in terms of looking beyond this niche into more general machine learning or maybe even beyond to see sort of like whether the general idea has been applied, and the various levels at which something would have been done. And the challenge is [that] different people described the same problem, using a different language, based on the fields they come from. - PX

The selection of the keywords for searching for the paper was very [hard]. I was stuck at the beginning, because I had limited knowledge and I used to search using the same keywords - PX

pp-action-seek-older-paper, 1

I would even be interested in understanding ... if back in the 80s did people did small lines of research on this area. And I have no idea how to do that. - PX

pp-no-filter-control, 1

(no quotable)

The literature finding data scientists

action-intentional-subscription, 7

I guess, this is more recent, but there are like a few researchers who select like 10 papers every week I think on Twitter - PX

Because I feel like the people you follow are mostly the ones who are closely related to what you're working on and then they tweet about their own paper so through Twitter it's also very easy to find papers - PX

So I mostly use google scholar to find papers, I do have some search keywords on arxiv and the daily email you get but it becomes like noise after a sometime, you dont pay attention to it - PX

A anyone get subscribed to [company wide email updates], ... and anyone interested can join the group curating these papers ... and anyone can actually send it to them as well and they take a look at whether it has already been said before, if not they'll send it out - PX

action-automated-recommendation, 4

I use it every day and start off with all the recommendations of papers, they give me for that day, and then I go through them and see what's relevant or not and based on how much time I have that they actually read the papers and also based on how many relevant recommendations, they actually give - PX

another feature of google scholar that i like is that when you have a profile on google scholar you get recommendations as well And those are on point, my recommendations work very well. So that is a list of very recent work that i should take a look at - PX

action-author-guided-discovery, 13

Sometimes you find something which is very interesting or relevant so one thing i do ... is i go and check who the authors are and i go to their personal websites or their google scholar profiles to see what else they have published because it may be relevant as well. - PX

So this is, I think, a good paper because it's the paper of [name redacted]. I know from experience that he has worked in this ... for a long time. That's one of the indications that this paper is relevant to my work. ... I'm assuming that these [coauthors] also have similar works in this space. - PX

One thing that I constantly have going on, especially for ... just keeping up is Google Scholar alerts for authors that I have previously found interesting ... so just I subscribed to a few authors and then keep getting like weekly digest of what they have done. - PX

I follow people on Twitter who work in areas that i'm interested in and I may discover papers that are published, people often will post ... very recent papers or they retweet someone else papers and that's a common way that I discover a lot of papers ... so this is more of a passive way for me to discover papers - PX

When I started I used to actually like scroll through my Twitter feed and pull out papers that were interesting, but now I feel like that's just takes a lot of time - PX

action-social-recommendation, 14

Mostly with my supervisor ... if you want to know more about this topic, then these are some important papers say you should definitely go through - PX

I like to ask for referrals from professionals or researchers in that area and i'll try to ask them if they know of not necessarily papers ... but they'll say oh check out this author - PX

A pointers slack channel where people keep sharing whatever they find interesting and then if i have read it then we get engaged in a discussion on slack, sometimes follow up papers come up as a result of reading group ... and sometimes if i find a paper which may not be relevant for my work but i know someone else is working on something similar then i just forward it to them. - PX

If they find something which they think is interesting for me they send the paper, if I'm starting a new project and we had some discussion then people suggest papers to read. - PX

There are some of my peers from both industry and academia, who if they come across something interesting and they know that [name] is working in these problems or she finds them interesting they just send it over. Sometimes it's, not even a paper it's just a blog post, which then has links to other papers. - PX

I guess the benefit [of recommendations from peers is] its gone past one set of eyes. So there's some added incentive to read it, somebody said it was interesting and that the claims make sense. - PX

pp-domain-bubble, 4

Okay, the thing is with so like recommendation system recommender systems are mostly very bubble based right so I think that might be an issue that i'm probably heavily in my own bubble of papers So, so that would So say if i'm working on hate speech, most of my recommendations will be very computer science based but maybe there's like relevant stuff in It when social science, or something Then, probably never going to come across

I don't follow people outside of my own bubble, so on Twitter, I have to control myself, maybe I should seek for more papers for things outside of computer science - PX

(also quotes in pp-keyword-mismatch)

pp-subscribing-but-not-opening, 5 (related to passively understanding discipline)

(quotes above)

pp-no-recommender-control, 1

I don't follow people outside of my own bubble, so on Twitter, I have to control myself, maybe I should seek for more papers for things outside of computer science - PX (also in author guided discovery)

pp-continually-updating-on-lit, 1

(no quotable)

pp-subscribing-to-topics, 1

(no quotable)

Soliciting recommendations from peers (merged into another theme in the paper)

action-social-recommendation, 14
action-seek-experts, 1

Mostly with my supervisor ... if you want to know more about this topic, then these are some important papers say you should definitely go through - PX

I like to ask for referrals from professionals or researchers in that area and i'll try to ask them if they know of not necessarily papers ... but they'll say oh check out this author - PX

A pointers slack channel where people keep sharing whatever they find interesting and then if i have read it then we get engaged in a discussion on slack, sometimes follow up papers come up as a result of reading group ... and sometimes if i find a paper which may not be relevant for my work but i know someone else is working on something similar then i just forward it to them. - PX

If they find something which they think is interesting for me they send the paper, if I'm starting a new project and we had some discussion then people suggest papers to read. - PX

There are some of my peers from both industry and academia, who if they come across something interesting and they know that [name] is working in these problems or she finds them interesting they just send it over. Sometimes it's, not even a paper it's just a blog post, which then has links to other papers. - PX

I guess the benefit [of recommendations from peers is] its gone past one set of eyes. So there's some added incentive to read it, somebody said it was interesting and that the claims make sense. - PX (tagged for establish credibility)

If you don't know where to start and you just reach out to someone especially in a company like [redacted] it's actually a lot easier, because there are a lot of subject matter experts and they help us. - PX

When I was starting a new project I would either be given a paper by my supervisor or I would ask them for some places to start. ... During our initial conversations I'll be like hey so since they usually have been going over the ideas I would ask if they have any literature to point me to. - PX

How do data scientists select papers?

An onslaught of papers

pp-unfocussed-search, 11

The most frequent reasons of frustration ... is getting bombarded with a lot of results ... you just wanted to do a shallow focused lit review and probably get some open source code to try out something but instead you kind of went back into that rabbit hole. - PX

There are a lot of papers that are incremental updates so to find that one seminal paper that actually started it all is going to be painful and unless someone helps you out it's actually very hard. And they're also very few blogs kind of thing in the ASR space that people help you with. - PX

pp-exploding-paper-volume, 8

especially now that nlp or like Ai in general is like exploding there's a lot of papers out there I think, is the volume of papers uh huh and what and then on top of that, the volume of papers that might seem relevant to you, but are actually not - PX

If the field is very crowded - sometimes I find RL, and the problems I am focusing on to be crowded, then it becomes frustrating and you're always finding papers [that do the same thing]. - PX

Unless you feel that you're very familiar and you're keeping up with the most updated [papers] ... things can go very easily out of hand. In grad school ... a professor synthesizes these things and says hey, this is the main theme of all these papers ... When that information is there for you ... start reading the paper it tells you what to expect otherwise you're spending a lot of time and don't understand how different it is from previous papers. - PX

I think one thing is just the sheer volume of literature available to us ... it basically means that you have to either start [with] a 100 papers and you know probably not even understand them totally, which might be a waste of your time or you have to give up on a whole section of them and just read a few in depth, which is also not very effective - PX

pp-paper-fomo, 6

I personally end up going down rabbit holes very often i'll end up following like references unnecessarily back into the past ... My sort of solution for it, it doesn't always work ... is to remember good reviews and some paper I've read before so those serve as good starting points - PX

So it almost becomes a problem of balancing the horizontal search, like the papers i have selected the 10-12 papers which are most relevant. And the depth search of taking each of them and going in the rabbit hole of their literature reviews. - PX

action-prioritizing-results, 8

I'll usually find like either make a keyword search and see a bunch of papers or i'll go to Papers With Code and see a bunch of relevant links and I'll usually click on like five or six of them either the first ones that I see in the in the priority list or i've heard of or like seemed for some reason interesting so ill click a bunch of them and so maybe follow almost like a breadth first search right, where I like first look at at the top level all these papers ... - PX (tagged for action-site-search)

yeah I usually have a bunch of tabs open up during the meeting because and then I go back and filter if it's relevant or not - PX

I mostly go until the 10th page and it also depends on how many tabs I have open, if I feel like i'm gonna crash my laptop with the amount of tabs then I start going through each paper again and ... make a mental ranking of is this actually still relevant for me. - PX

And I also like to sort of focus on a few works at a time, once I started do a broad search of literature and I find a couple of good things I'll just kind of focus on those, really figure out what I could glean from them and then move forward to potentially finding more resources - PX

So I get a bit frustrated basically a lot of information overload reading a lot of papers, so i try to take a pause and focus only on two papers which I feel are good for today so the papers I read in detail, like I can say if u understand it 50-60% minimum i put it in notion and put a paragraph of what the high level summary was - PX

Now if it doesn't do anything groundbreaking that i'm supposed to remember, then it's just a one line sentence, probably, but if it's like very distinct from things that I haven't read before and I might want to use this in any type of way, then I mostly write down a longer summary. - PX

And yeah so mostly in this way I have like at least 5-10 papers initially and then go through at least one or two papers and [decide] to go ahead with other papers or not. - PX

minimize-information-overlap, 1

make like a mental ranking of okay how actually is this actually still relevant for me or does it contain contain information that's being repeated in other papers as well - PX

balance-depth-breadth, 4

So it almost becomes a problem of balancing the horizontal search, like the papers i have selected the 10-12 papers which are most relevant. And the depth search of taking each of them and going in the rabbit hole of their literature reviews. - PX

discovering-repeated-citations, 6

I look if [papers] refer to some papers, or some method over and over and try to find those papers and add that to the collections - PX

Usually there will be a collection of papers cited in all of [the papers] and for me that is a good proxy of here is the core papers I should be looking at. - PX

Whatever paper i read about bayesian optimization keeps refering to this so its some confirmation that I should read this. - PX

pp-nonexplorable-result-page, 4

Since the results are not arranged in a way that is easily consumable ... and you have to manually take that effort. - PX

people often would place significance in different keywords based on the actual problem and it often misses some of those high level keywords which can be used to cluster those papers together for different purpose - PX

pp-summarize-paper-group, 5

what kinds of lines of research exist in a certain area I will want to go and explore what has been published throughout multiple years because I don't Just care about one year, I want to see sort of like a five year trend, or even a 10 year trend. - PX

You know, no one summarizes the way they do on Twitter it's kind of like you find the paper you have paper, there is no like nice handy here are the highlights. - PX

pp-more-survey-papers, 1

(no quotable)

Establishing the credibility of papers

action-connect-toknown, 7

I particularly picked this paper out of the whole lot, because I know of two authors here, who I followed their work before, so I know that it's or I like it's a good paper, I might be aware of what they have done before. - PX

action-establish-credibility, 8 (related to social aspect; Stuck in their domains of knowledge)

If a paper gets accepted at ACL, or NAACL, or EMNLP I trust this paper it usually is probably good because it was peer reviewed [so] I would probably spend some time looking at the people and see what they're doing. - PX

Which is an area that i'm really not familiar with ... the literature review is really hard because ... in addition to seeing ... if my information need is like being answered by this paper I'm also doing a sort of implicit classification of whether the paper is reliable or trustworthy. - PX

I guess the benefit [of recommendations from peers is] its gone past one set of eyes. So there's some added incentive to read it, somebody said it was interesting and that the claims make sense. - PX

Trying to figure out the credibility, based on discussions by online forums - PX

Authors from institutions that I perceive as a reputable, publication venues that I interpret as reputable or stuff like that ... that's sort of how I typically judge a paper - PX

Another sort of difficult thing is like because i'm not very familiar with security and privacy, ... I don't know which conferences are like good ... so I might try to ... figure out what are considered the top venues in that area. - PX

[In peer discussions, I] sanity check that this paper looks useful, do you think it's useful, should we apply something similar, should we try this method - PX

action-judge-papers, 2

Sure, especially, especially in a in a field that I am not familiar with I will make a lot of use of like looking at citation counts on semantic shcolar As a way to assess the kind of the intrinsic value of that paper to the Community - PX

In a field that are not familiar with it's much harder to make those judgments as to the intrinsic value of paper like - PX

Because you don't always know how reliable the work is ... sometimes I also kind of just try to judge it based on the venue it's been published at, I tried to get a feeling of you know what type of language they're using sometimes isn't like i'm seeing this one is saying DNNs And it's published in 2021 where people don't really say DNN, these are the subtle cues that I look up. - PX

pp-establish-credibility, 2

(similar to action-establish-credibility)

pp-long-paper, 5

But yeah sometimes, papers are very long and may be tedious in themselves. But what ends up being tedious is that you invest a lot of time reading them and then cannot use them really - PX

pp-mismatched-info-scent, 8

People do a lot of rebranding, sometimes a lot of ideas are not very new .. but the motivation section is like poetry .. but when you read the details you feel [its] not what they are claiming they do. ... [or] exxagarating their contribution and not meeting the expectation in their experiments. So identifying those trends from papers is very important for those who are trying to directly work in those areas. - PX

You might find later on that someone in this appendix actually improved the thing that you spent the last week proving or the last month proving even. But this is done as a tiny appendix somewhere ... which you had not paid attention to. - PX

Oftentimes, I feel that the papers are not written with this idea in mind that what people are actually looking for. - PX

pp-judge-papers, 1

(Similar to pp-mismatched-info-scent)

pp-unclear-reading-depth, 2

I never know how deeply I should be reading any given paper and there's this frustration of always having to come back and possibly read it again because I did not read it deeply enough the first time - PX

Everyone skims papers

action-skimming, 17

Mostly look at figures because I feel they [an idea of] if it's done in a correct way. They give a quick overview. These confusion matrices, for example, really help in easily understanding what kind of results they get. - PX

If I just have an intuitive sense that it's not going to be useful or if I can skim it and immediately get what I need, or I can get some kind of pointer for what I need I can just move on - PX

Sometimes here I'll just skim through the tables that might give me like a very practical sense of what is being measured - PX

When reading a paper, I know exactly where to start reading to get a high level idea - PX

I tried to jump to wherever they say ``in this paper'', because i'm fairly familiar with the space I don't need to actually look at the introduction. - PX

(statements of looking specific things - hard to glean more from this)

action-skimming-abstract, 20
action-skimming-paperstructure, 8
action-skimming-introduction, 8
action-skimming-conclusion, 7
action-skimming-method, 5
action-skimming-results, 5
action-skimming-related-work, 4
action-skimming-problem, 4
action-skimming-figures, 4
action-skimming-baselines, 3

What do data scientists do with found papers?

(excluded from the paper)

A multitude of workflows with a preference for simplicity

action-paper-management, 14
action-topical-organizing, 7

[This extension allows] names for your windows and persists your windows with the tabs open in them. I will usually have a window open for either a project i'm working on ... and i'll have one for reading ... when I'm ready to close down I'll just go through my tabs and copy them all to Notion. - PX

I save [blogs] on pocket ... papers, I just leave it open in one of the groups in my browser and if I have some free time, I just open a group and start browsing just to see if there's something interesting that I could look at. This is for casual browsing. The other hand, when I have a very specific research area ... I do this process of using Google for getting a lot of papers and whatever was interesting I save them locally. - PX

I collect these pdfs on my ipad. I keep a running list of what i am reading, so this is as simple as the author, title and publication year and venue in a notepad on my iPad. Its just hand written, its not ordered, if anything its ordered in the order that i read the papers.... sometimes i star or bookmark a page and i tend to do that when i am in a rush ... and then i routinely go through my bookmarks and see what is there. - PX

So just the search portion I found that my methods for being organized have become simpler and simpler over the years, but I basically just use like a notepad ... If if something is actually like important enough for me to read to me I just ... write about it in the draft of the Related Works in the draft of the paper - PX

Recently I moved to like a tool that somebody in my group coded up, it's a simple web interface and maintains a database on the back end, but it does the same job - there's like one interface, which is a list of stuff to be read and another for notes - PX

action-saves-in-browser-tab, 9
action-browser-bookmark, 5

(very similar to action-paper-management)

Annotations to retain context

action-note-taking, 16
action-paper-annotation, 7
action-highlighting, 6
action-inline-note-taking, 4

I have a whole overview of which papers i'm supposed to read and then per paper I write down a summary depending on how how relevant the paper is for me. If it's not very relevant, it doesn't do anything groundbreaking that I'm supposed to remember, then it's just a one line sentence. But if it's very distinct from things that I have read before and I might want to use this, then I mostly write down a longer summary - PX

So the papers that I read in detail, if I understand it 50-60% minimum I put it in Notion and put a paragraph of what the high level summary was. ... I like to make the list of papers in my local system only if i understand it, not when its just when I know about it because that is more noise in my head. - PX

I have found that it's easy to take notes with the data itself ... and that gives you a sort of searchable database of your own notes. - PX

And because it's on a tablet so you can modify the structure of the papers, so that you can retain context there. You can insert pages ... in those pages you can take a screen grab of certain models, or a certain equation, create a scratch notebook of sorts. - PX

In the google doc I just write a small snippet about this paper, so that later if I need to go back to this paper, I can quickly read that and get refreshed about what I was reading. - PX

I do think of a lot of questions [while reading] ... and I will mark them, so, if I can remember some question that I thought of when reading the paper that sort of simplifies my life. Instead of remembering the paper, I can remember the question. - PX

When I read a paper, I have this whole color coding scheme where basically I highlight the most key points in a particular color, the one takeaway message from the paper. So I know that if I come back to the paper and I need to just get back one key message and I look at this one color. - PX

I make notes of what I think about it, what questions I have, and what I would want to do with this information ... it almost always boils down to if I can use this directly, or I can build on this to to do something differently. - PX

I will just write myself a note like ``this paper talks about this this'' and "these are its limitations or deficits'' - PX

I usually like to read it on my iPad I have Notability which helps you make notes and write equations and circle things. - PX

I use highlighting a lot, I tend to highlight with different colors, things like what is the problem, the existing methods they look at, and what they compare against. - PX

To read the papers I use some color coding to mark the PDFs with different colors - the main motivation of the paper, the main contributions of the paper, and such - PX

(above codes are all similar)

What challenges do data scientists face in reading papers?

Understanding the hidden details

action-seek-code, 7

Literally looking at a paper and seeing whether other people have code which uses that same algorithm or whether they've run into similar issues we have run into or whether there is an official repository for this sort of research paper. - PX

(many about mentions of seeking code - often to prevent re-implementation)

code-fill-missing-details, 2

Is there any publically available code for what youre doing? Because many of these papers which look well on paper but then its unclear how to implement them. Or its unclear which specific hyperparameter choices they made. - PX

This paper is a great example for example they are suggesting some methodology but I don't know, for example, whether this is exactly a CNN, or what type of convolutional network it is. What is the exact loss function so. Even though I can try to find it [in the paper] it's a little more time consuming. However in the code, I can simply look at the code and it's easier to find exactly what they did. - PX

pp-missing-details, 10

So a big part of [the frustration] is when there are incomplete details or a lot of things are assumed. - PX

Often the things that are in the paper or the things that make a good story, but the things that actually matter in the paper, for example, like how you set the hyper parameters. - PX

A lot of times in the paper you don't get the complete idea of the details. - PX

And then of course there's always a thing that if have the time to read a paper and you are getting into it and you face a lot of detail. Sometimes, thanks to length and sometimes these certain details have to be omitted and, at times, those are the very details that would cause certain confusion to a reader. - PX

Understanding the math on display

pp-mathy-writing, 8

Oftentimes these papers are very technical and I might not be very well versed with the technical terms just yet so a very simple example can be very helpful. A real life scenario which can be connected very easily and with that example if the author [shows] the shortcomings of the previous work and how the current work excels, it becomes very clear. - PX

action-seek-blog, 4 (PX for action-understand-problem)

Blogs help because sometimes the writing [in a paper] is not that easy to understand, as compared to an informal way of writing, especially if you just want a high level understanding of the paper. - PX

pp-understanding-algorithm, 1

[Code] helps with sometimes understanding papers and especially math like papers, which are very math heavy so difficult. - PX

pp-difficult-to-understand-paper, 2

I've written for niche audiences ... even for this niche audience it is having to show that this is important or useful and often like that means that they will add equations, theorems [for an idea] that is really are not as complicated ... if there's a lot of math or if it's hard to understand it must be impressive. - PX

Struggling to understanding the increments of progress

action-diff-to-paper, 6

So when you're not an expert ... once you've read 10, 20, 30 papers it becomes a lot clearer what the core idea is and what the delta from the core is. ... For things you're not as well versed in it's it's often just a nightmare, when you don't know the context in which the paper is being written. Understanding what exists in the literature and what doesn't is hard, then what the contribution is and why the contribution matters is hard. - PX

If there are famous really impactful or papers which are really interesting for a lot of people Then it is interesting to see how different authors put them in context of their own work - PX

There might be pros and cons [between methods]. And that is very helpful in a literature review because when you then write a paper you have to determine the limitations and advantages of all of these and just visualizing it gives me an idea of what I am dealing with. - PX

Sometimes if a paper is highly relevant to a project we are doing, more often with my lab mates. Lets say I am doing a project with someone, i schedule a meeting to discuss that paper in detail so that we understand different aspects of it and we can discuss what we are doing differently so that we can position our work better. - PX

You want to build a mental map of this but then it gets very noisy. In the sense that its very hard to makes sense of what i am trying to do vs them. So i would say lack of understanding triggers that frustration and then the list becomes longer and i would not know what to write about those work in my lit review. - PX

Similarly many a times i like reading blogs. They make it a lot more interactive. - PX

pp-cross-domain-understanding, 2

I know nothing about this research area and I don't know another person who's already an expert who could just give me this information - PX

Sometimes its a difference in philosophy, I like to make some assumptions, as an example, I dont use a lot of deep learning in my work but if a lot of people are using deep learning for the problem then i might not understand what they they are doing. Then literature reviews become diffused, [every paper seems similar] - PX

I was reading this paper on RL on how modular systems can help in generalization, [there were] four fields from which ideas were combined and so it looked all fancy and exciting but I had to actually understand what they did. And the paper does not provide the opportunity to explain the principles. - PX

I find that there is a sort of like a subculture associated with different fields if you understand the principles and where they're coming then its possible to easily read the paper and understand what they say and what they do, but if you don't I find it very, very hard to sort of figure out what the paper is doing, why it's doing it. - PX

pp-bad-writing, 2

The lack of clarity on their contribution - it might come from how they wrote that paper or it might come from my complete lack of understanding of what they might be doing. - PX

Some papers is just completely an attack ... you see a whole lot of abbreviations, which are sometimes never put in full form once or a lot of concepts which are totally unfamiliar, so reading that one paper becomes like reading one paper plus 100 other things - PX

pp-missing-assumptions, 2

(similar to missing details)

pp-missing-experiments, 2

Also, sometimes incomplete comparisons to algorithms are frustrating. So let's say you're using a specific algorithm already and they don't compare against that. Not that the author's themselves will be able to compare with every algorithm under the sun, but it's if you're in the same domain it kind of gets frustrating - PX (relates to diff to paper)

The artifacts surrounding a paper (excluded from the paper)

pp-bad-code, 4

Mostly it usually falls into the bucket of incomplete reproducibility Of the work, whether it be a code guidelines, so yeah it usually falls into the reproducibility especially using code Or they do but then it's like very hard to reproduce it. - PX

So no I think it's it's definitely like for some, for the work that ends up being adopted, actually, that is actually yeah there is one of the frustrations right like a really good people might not have good good code Night no you're not using it, because it doesn't have with good And maybe like a shoddy paper has a good implementation but You kind of know it works, so you don't really care about the paper as much so, it says that. - PX

pp-missing-code, 2
pp-assumed-resources, 1
pp-missing-datasets, 1

The other thing is sometimes like Open Source data that a lot of paper Like, especially if it's an old paper, it seems to have just vanished off the Internet, it happens See the quickly open that up and see if it works like a lot of times, I see that older papers, point two data sets that actually do not exist anymore. - PX

pp-outdated-resource, 1

it would be nice to have sort of a website that just lets down an updated version of all the Open Source thing related to a particular topic Like sort of your papers with code but papers with code also doesn't update by saying oh hey this data set doesn't exist anymore Or it as a new, updated version that It that had made corrections on some errors that was there in the older version, or something like that. - PX

How data scientists lean on social ties

Collaboratively brainstorming and making sense of papers

action-existing-mentor-discussion, 14
action-note-sharing, 5
action-peer-discussion, 12
action-collaborator-discussion, 11

I'm part of some reading groups in the university and they'll usually present paper every week and most of it is not relevant to what I am working on, but sometimes it will be or sometimes it might motivate new ideas. - PX

Usually you discuss some recent conference paper and sometimes we talk about how we can use that in our system [this is] usually casual and open ended. - PX

Mostly talk about whether this method would be beneficial for us, what the algorithm specifically does, which parts we can take. [We may also discuss] as to why did you go from this to this step, that is another way somebody who's reading the same paper online as a collaborator or a sort of discussion forum readers can help you answer as well. - PX

Sometimes I also talk about the paper with my lab mates ... research is mostly about brainstorming right. So with lab mates we share our ideas ... this is what i'm thinking, what do you think. We have a weekly meeting so when we discuss our work with other peers they suggest that maybe that paper is relevant to your work and that's how we find the paper as well. - PX

When I am doing targeted research I almost always I'll have someone to collaborate with. At the very least, what I do is I talk about my idea with someone else just to see if i'm in the right direction. On the other hand, we have bi weekly brain-storming sessions in which we discuss at least one paper ... when we do do that these will be papers very relevant to the work we are doing and so we get a lot of inputs ... this might be useful, this might be a limitation, it might not actually work. - PX

If i'm having one on one meetings then we dig into why people made certain decisions in their paper, and if we should be following the same. - PX

The kind of feedback which advisors provide is useful, insightful and new, which literature review cannot provide. ... But of course there are certain aspects of the problem which they might be missing. So to get feedback for those, reading papers helps. - PX

It also helps with collaborators double checking my writing. - PX

Whenever we have a meeting I summarize the papers by myself and then just present my findings. It is good to have this information presented so that we can know that we are comparing [to other papers] in a valid way. - PX

I usually share these notes with collaborators so to give them a sense of where I am getting this idea from or where this hypothesis is coming from. ... I share mostly if I have made some extension on top of that and I discussed it in the project meeting. - PX

Its mostly asynchronously to let the collaborators know what papers I am reading. Once I have these papers listed then I'll try to summarize them in a slide if I have to present it to somebody or i'll just forward this [list] to other people - PX

action-collaborative-writing, 5
action-existing-mentee-discussion, 1
(above codes are all similar)

Leveraging authors

action-initiate-author-discussion, 5

First off, if I find a paper very important paper for my work ... I do reach out [to the authors] by email a lot. Sometimes its about the methods themselves, and i might have a question about an algorithm. For example, is this algorithm the same as a [another algorithm] or if the limitations are not clearly stated. - PX

A little while ago a new paper got released that was very similar to what i'm trying to do ... and so I emailed the authors ... this is something I will typically do in this type of situation to create a conversation and also connect with researchers who are doing similar work to me. I had a a multi day email chain with them where we discussed things and I wanted to confirm the differences [of my method] with them, which was extremely helpful. - PX

[If it is a] new person unknown to me, and I'd like to keep them in my network, and I do send them a message congratulating about the work they've released and understand some of the secrets behind you their work. - PX

It happened that I contacted authors two-three times, asking for some for data sets and those two contacts that I have made led to good collaborations and now we are writing some papers together. - PX

Sometimes it happens ... my boss kind of find certain things and then says hey I think we should work with them, and puts you in touch with them. - PX

Listening into others conversations (merged into another theme in the paper)

action-seek-conference-discussion, 7

And then sometimes listening to talks ... its easier to understand a paper if someone presented it in a conference. So listening to a shorter version of it, the paper might be too heavy and too full of details.

I will also look on YouTube to find the presentations of the articles and that helps me understand better and decide whether I want to read the paper or not. - PX

action-seek-scholarly-discourse-online, 6

It's sort of trying to figure out the credibility, based on discussions by online forums ... Even if this is highly reviewed what other people who have worked in similar domains think about it. Sometimes You can also get very far out discussions, let's say a university class discussing that same paper and then putting out some notes and you just get more context. - PX

So this online community is called ML-collective, their discord channel has a lot of volunteers and contributors ... if you can figure it out someone who's interested in the same kind of work ... [you can be] talking about some question ... like how does the algorithm go from this step to this step or what is this variable. So like specific questions details in the algorithm. - PX

Or you will find a Quora answer or in Yahoo answers which are very heavily dominated by exceptional mathematicians. Say someone will have asked a similar question and then someone will have pointed them to some relevant work. - PX

Sometimes on Twitter when people speak about the papers, or like tweet summaries of papers, they have read, those are helpful. [People on] social media have highlighted here is what I thought was the most interesting, have a critique as well.. - PX

action-seek-author-discussion, 6

Aometimes you can see the author's themselves explain their paper on that's a panel or a recorded talk ... Sometimes also you can get things like the author themselves talking about it on a podcast. Which is insanely helpful. - PX

And sometimes getting a very good intuition of what their motivation was helps understand. And sometimes it might be not that clear in the paper but when they explain to you in video its much more exciting. I think visual communication [is useful], like in their talk they used a lot of good design and their slide animations ... that cannot be communicated in a paper. - PX

I find that talks are more high level and more it's a lot easier to get a sense of what the paper is actually doing. Something I say is people write papers, such that they get accepted and reviews cant say anything about them. But I will give talks to inform people, the pretense of having to get a paper accepted, is no longer there. Then it is to sort of make sure the paper is understood and well received. I find that, incentives for talks make it more clear. - PX

But sometimes if the author inside was giving a talk about the paper or even if it's an hour or something I feel like it's probably worth it, because they understand the topic best. But this isn't always applicable because I think it's only the more famous people's projects that get this much coverage. But this is like I'm presuming I'm going into a brand new area and starting with the most famous peoples around there. - PX

action-seek-fellow-users, 3 (Understanding a paper)

Unorganized because of being too infrequent, incorrect, or not a focus for the paper.

Practices and painpoints

action-alternatebw-recency-relevance, 4
action-alternatebw-search-discovery, 3
unexpected-useful-paper, 7
action-alternatebw-exp-reading, 4
pp-alternatebw-body-references, 2
pp-interface-problem, 4 (not everything here is about workflow)
pp-inflexible-workflow, 2
pp-no-browser-import, 1
pp-reading-support, 1
action-browser-paper-import, 5
action-printed-reading, 5
action-ipad-paper-import, 3
action-question-the-known, 3
understanding-algorithms, 3
action-writing, 2
site-reverse-engineer, 1
action-paper-sync, 1
action-keywords, 1
action-lit-for-peerreview, 1
pp-access-restrictions, 4
pp-nonexplorable-conference-site, 2
pp-lost-research, 1
pp-peer-review, 1
pp-no-talk, 1
pp-trouble-refinding-items, 2

Tools for saving and note-taking

tool-overleaf, 8
tool-ipad, 8
tool-zotero, 6
tool-goodnotes, 3
tool-notion, 3
tool-googledocs, 3
tool-custom, 2
tool-mendeley, 2
tool-adobe, 2
tool-google-drive, 2
tool-obsidian, 2
tool-excelsheet, 1
tool-msword, 1
tool-googleslides, 1
tool-zetaalpha-reader, 1
tool-browser-extension, 1
tool-physical-notebook, 1
tool-semantic-reader, 1
tool-google-sheets, 1
tool-icloud-strogae, 1
tool-trello, 1

Other tools

tool-googlescholar, 18
tool-google-search, 9
tool-slack, 9
tool-twitter, 8
site-arxiv, 7
site-twitter, 7
tool-feature-recency, 7
site-conferencewebsite, 4
tool-semantic-scholar, 4
tool-text-editor, 4
tool-feature-citationfilter, 4
site-author-homepage, 3
site-aclanthalogy, 3
tool-feature-authorgspage, 3
site-medium, 2
site-openreview, 2
site-paperswithcode, 2
tool-librarysearch, 2
site-ml-collective, 1
site-reddit, 1
site-wikipedia, 1
tool-zetaalpha, 1
site-pubmed, 1
tool-acmdigitallibrary, 1
tool-feature-influencefilter, 1
tool-signal, 1
tool-zoom, 1
tool-distill-pub, 1
tool-whatsapp, 1
tool-linkedin, 1
tool-github, 1
tool-powerpoint, 1
tool-research-gate, 1
tool-youtube, 1

Participant designations + research areas

Redacted for privacy.

Participant job goals

Redacted for privacy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codes-extended-quotes-themes.md

codes-extended-quotes-themes.md

Extended quotes corresponding to the themes

Why do data scientists access the scientific literature?

How do data scientists access the scientific literature?

How do data scientists select papers?

What do data scientists do with found papers?

What challenges do data scientists face in reading papers?

How data scientists lean on social ties

Unorganized because of being too infrequent, incorrect, or not a focus for the paper.

Files

codes-extended-quotes-themes.md

Latest commit

History

codes-extended-quotes-themes.md

File metadata and controls

Extended quotes corresponding to the themes

Why do data scientists access the scientific literature?

How do data scientists access the scientific literature?

How do data scientists select papers?

What do data scientists do with found papers?

What challenges do data scientists face in reading papers?

How data scientists lean on social ties

Unorganized because of being too infrequent, incorrect, or not a focus for the paper.