-
Notifications
You must be signed in to change notification settings - Fork 14
/
impact.tex
355 lines (305 loc) · 19.5 KB
/
impact.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
% ---------------------------------------------------------------------------
% Section 2: Impact
% ---------------------------------------------------------------------------
\section{Impact}
\label{sec:impact}
The central impact of the \TheProject project will be a significant improvement
and extension of the Jupyter tools and ecosystem described in section \ref{sec:project-jupyter}
to facilitate open science and reproducible research,
and their wide availability on the European Open Science Cloud.
While this will initially be developed alongside applications for our
own institutions, we expect the tools we develop to be useful to a much larger
group of researchers, as data analysis and simulation are now crucial parts
of most academic disciplines.
Jupyter-based technology will be especially valuable given the decentralised
nature of EOSC. Large-scale experiments often produce so much data that it is
impractical to transfer the data to another site for analysis.
At European XFEL, for instance, hundreds of terabytes of data may be recorded
for a single user experiment. The analysis steps thus need to run where the
data is stored, even if the scientist has returned to their home institution
far away. As the Jupyter Notebook interface runs in a web browser,
it is readily usable for remote access.
In addition, \TheProject will produce highly visible demonstrations of
notebooks for open science in targeted scientific disciplines.
The result will be innovative new prototype services that will provide
direct benefits to the early adopters in various research fields.
Moreover it will serve as a demonstration of a strategy for open science
using notebooks that will be applicable across many domains.
EGI supports the e-Infrastructure uptake of several ESFRIs in the EOSC-hub project.
Several of these already experiment with Jupyter (either using the EGI's JupyterHub
service, or community specific installations). These communities will
be reached out to by
EGI directly in the EOSC-hub project, and through the annual events organized by EGI
around EOSC: EOSC-hub weeks, the DI4R conferences and EGI conferences. During these
events EGI will contribute to promote the adoption of the project results.
\subsection{Expected Impacts}
\eucommentary{Please be specific, and provide only information that applies
to the proposal and its objectives. Wherever possible, use quantified
indicators and targets.\\
Describe how your project will contribute to:\\
-- the expected impacts set out in the work programme, under the relevant topic
(including key performance indicators/metrics for monitoring results and impacts);\\
-- improving innovation capacity and the integration of new knowledge
(strengthening the competitiveness and growth of companies by developing
innovations meeting the needs of European and global markets; and, where
relevant, by delivering such innovations to the markets;\\
-- any other environmental and socially important impacts (if not already
covered above).\\
Describe any barriers/obstacles, and any framework conditions (such as
regulation and standards), that may determine whether and to what extent
the expected impacts will be achieved. (This should not include any risk
factors concerning implementation, as covered in section 3.2.)}
The expected impact of \TheProject with respect to the
work program is detailed in the table below.
\begin{center}
\begin{tabular}{|m{.3\textwidth}|m{.7\textwidth}|}\hline
Expected impact & \\\hline
Integrating co-design into research and
development of new services to better support scientific, industrial and
societal applications benefiting from a strong user orientation &
The Jupyter tools have always been driven by a close connection to users; since
the project began as IPython in 2001, many of the developers have been
scientific researchers using the tools as they developed them. More recently,
when Jupyter has benefited from dedicated developer time, developers have
remained in academic institutions, in the kind of role now referred to as
'research software engineers', allowing day-to-day interactions with
researchers using Jupyter in a wide range of fields.
By supporting developers in various research institutions where the improvements
will be used as they are developed, \TheProject will continue this invaluable
collaboration.
The improvements and extensions of the core parts of the Jupyter system are
being co-designed by technical, industrial and scientific experts in the
\TheProject project, so that they will be widely applicable in new innovative
services across most academic disciplines.
The impact of this approach for enabling scientific use of notebooks is expected
to be very high because it is a direct response to the strong demand from
scientists for improving the productivity and reproducibility of their work.
The notebook approach is being embraced in many scientific disciplines, so the
proposed services to be developed in \TheProject are strongly oriented to the user needs.
\\\hline
Supporting the objectives of Open Science by
improving access to content and resources, and facilitating interdisciplinary
collaborations &
Jupyter notebooks have seen rapid uptake in many kinds of research,
because they bring together the essential elements of the modern scientific
computational workflow (from data collection to publication and open sharing)
in the familiar format of a scientific notebook, with powerful functionality
for access to scientific content for analysis and visualisation.
Notebooks also embody the core concepts of open science by providing a
mechanism to reproduce results in publications, and collaborative
sharing of not just scientific results, but of the code that produced them.
We expect the use of notebooks in EOSC to improve access to scientific code:
digital documents and notebooks encourage publishing workflows, whereas code in
scripts or manual interactive workflows are often kept by the researchers who
performed them. The focus on clarity and reproducibility also helps to ensure
that data is meaningfully accessible, by preserving essential understanding to
make sense of the raw data.
We have already seen a good example of the Jupyter ecosystem facilitating an
interdisciplinary collaboration: the LIGO scientific collaboration shared
notebooks detailing the data processing steps which led to the discovery of
gravitational waves, using the Binder service to allow anyone to re-compute
the published plots. Scientists with no background in gravitational waves
studied these notebooks and improved the signal processing.
In this proposal, we want to provide this ability to a wider audience through
EOSC, including for disciplines which rely on processing much larger volumes of
data \cite{ligo-open-science}.
The astronomy application in \TheProject (\taskref{applications}{astro})
is designed to provide a new level of interoperability of
reference astronomy data within Jupyter notebooks.
By connecting new notebook capabilities to existing and highly used services,
we expect to have impacts for the users and also for the service provider.
The scientific users will have access to new capabilities,
and we anticipate adoption of new innovative ways of using the data.
We also expect an impact on the services themselves, in terms of usage,
but also in terms of capturing precious information and feedback on
how to evolve these services to best support open and collaborative use of
the data and services.
\TOWRITE{ALL}{More impacts for other applications/demonstrators ...}
\\\hline
Fostering the innovation potential by opening up
the EOSC ecosystem of e-infrastructure service providers to new innovative
actors &
Jupyter is a collection of open source software built around openly documented
protocols and formats, along with familiar technologies such as HTML and the
Python programming language. It's easy for third parties to create new
tools and services using and integrating Jupyter, as evidenced by the thriving
ecosystem of tools already in development, both by commercial and non-commercial
actors. To highlight just one example, the first version of the popular Binder
service was developed by a group at the Howard Hughes Medical Institute,
working independently of the core Jupyter maintainers, but building on the
powerful capabilities provided by Jupyter.
By bringing the diverse expertise of the \TheProject partners together in this
common project, we expect a high impact in terms of enabling a new level of
integration of scientific, technical and industrial interests for the common
goal of open science,
and building a toolkit from which others may build innovative services,
for commercial or public use.
\\\hline
\end{tabular}
\end{center}
\subsubsection{Measuring impact}
As we are building tools and services for Open Science,
the best measure of our impact is in the adoption and use of these tools and services,
which can be observed qualitatively (positive anecdotal feedback) and quantitatively
(counting visitors to a service, for example).
Much of our work will be in the form of contributions to existing public projects,
such as Jupyter and Binder,
which can be measured in our participation in those projects,
such as code and documentation contributions,
bug reports, and roadmap contributions.
We can measure our progress toward aims and objectives in \ref{sect:objectives}
via the following
Key Performance Indicators (KPIs):
\begin{compactenum}[\textbf{KPI} 1:]
\item \label{kpi:workshop-attendees}
Attendees at Open Science workshops organised by \TheProject participants.
\item \label{kpi:binder-publications}
Open publications for which the authors have made a reproducible version available
through \TheProject services.
\item \label{kpi:binder-visits}
Visitors to \TheProject services, engaging with open, interactive communications.
\item \label{kpi:dissemination}
Publications and presentations by \TheProject documenting the use of \TheProject services for
Open Science.
\item \label{kpi:contributions}
Contributions by \TheProject and the wider community to Jupyter software and others,
including issues reported, bugs fixed,
features added, and roadmaps developed.
\end{compactenum}
\subsubsection{Barriers, Obstacles and Framework conditions}
The \TheProject project will certainly face a number of challenges as it undertakes
the ambitious program of work described by this proposal.
We can identify a number of potential barriers and obstacles but overall
these are assessed to be minor and planning is in place to mitigate the
identified risks.
While a number of the partners have worked closely together in previous projects,
the integration of new partners from different disciplines will require
dedicated efforts for communication within the project.
A detailed assessment of risks and mitigations can be found in \ref{sec:risks}.
\subsection{Measures to maximise impact}
\TheProject is contributing to tools for Open Science and for building and operating Open Science services.
Tools only have impact if and when they are used,
so it is important that we disseminate our work
in order to reach and support user communities for our software and services. This section
outlines how the project will establish and organise the dissemination and communication
actions to promote the project and the adoption of its outcomes beyond the project's lifetime.
The dissemination and communication plan is outlined in the following sub-sections.
Therein we distinguish:
\begin{itemize}
\item Dissemination as the public disclosure of the results of the project through
a process of promotion and awareness-raising right from the beginning of a project.
It makes research results known to various stakeholder groups (like research peers, industry
and other commercial actors, professional organisations, policymakers) in a targeted
way, to enable them to use the results in their own work.
\item Communication as the strategic and targeted measures for promoting the project
and its results to a multitude of audiences, including the media and the public, and possibly
engaging in a two-way exchange. The aim is to reach out to society as a whole and
in particular to some specific audiences while demonstrating how EU funding contributes to tackling
societal challenges.
\end{itemize}
\subsubsection{Dissemination and exploitation of results}
\WPref{education} is focused on dissemination of \TheProject work.
Our goal is to facilitate Open Science through the development and use of open and freely available tools.
All \TheProject software will be made publicly and freely available under open source licenses, and
hosted on public code hosting sites such as GitHub.
Most \TheProject work will be in the form of
contributions to existing projects,
which will be governed by the licenses of those projects.
All Jupyter and Binder software is released under the permissive BSD license,
which specifically allows commercial exploitation,
as has proven successful in enabling collaborations with industrial partners
such as Google, Microsoft, IBM, and more.
This means that all \TheProject software will be available and accessible to all who find it,
at no cost to \TheProject,
enabling long-term access beyond the funding of \TheProject.
Similarly, non-code products such as dissemination works
(workshop materials, etc.)
will be made freely available under open Creative Commons licenses.
As a result, the primary dissemination effort is to:
\begin{enumerate}
\item make sure that prospective users are \textbf{aware of the work}, and
\item enable them to use the tools through \textbf{learning resources, training, and services}.
\end{enumerate}
Our focus for dissemination will be on \taskref{education}{workshops},
operating workshops, training various communities in the availability,
purpose, development, and use of \TheProject software and services.
We will make a particular effort to use these workshops as an opportunity
to \textbf{support diversity and inclusion in the Open Science community},
by running workshops for under-served and under-represented groups in the academic and
open source communities.
Additionally, for long-term resources available to the wider community
who will not be able to attend workshops,
we will produce \textbf{free, online materials for training} in the use of \TheProject
software and services.
These resources will be hosted on free, public hosting services,
such as GitHub Pages,
enabling long-term access to the work of \TheProject,
even after the end of funding.
The operation of services in \WPref{eosc} is also a dissemination activity,
as services like Binder not only enable Open Science by facilitating interactive publications,
they also enable \textbf{interactive demonstration of tools and functionality}
developed in \TheProject, e.g. Xeus (\taskref{ecosystem}{xeus-cpp}).
We have budgeted \euro 5000 per month for the operation of services in \WPref{eosc},
to be spent by \site{EGI} on cloud computing resources for service hosting.
This cost estimate is based on the operation costs of mybinder.org.
The involvement of hosting provided via EGI helps the project reach a sustainable setup,
because we can negotiate hosting conditions with institutes that are dedicated for the support of
open science and can co-fund the operational cost from their budget.
\TheProject, in collaboration with the operators of mybinder.org,
will explore sustainability plans for covering long-term costs of operating such services,
including institutional subscription models, donations, and others.
\medskip
\noindent \textbf{Data management plan}\label{sec:data-management-plan}\\
Except for the usage data described below,
\TheProject activities will not generate or collect data.
While we have many demonstrators that interact with data, they do not generate or collect that
data themselves, but rather provide analytical mechanisms or access to data governed by
existing data management plans and data policies of project partners at each site,
as well as publicly accessible open data.
\noindent \textbf{Service usage data} \\
Any data collected through the operation of public services such as Binder (\taskref{eosc}{eu-binder})
(e.g. popularity data for public open science repositories)
will be fully anonymised to the satisfaction of relevant best privacy practices and regulations, such as GDPR,
and made publicly available in the standard JSON Lines format,
as is done already for mybinder.org \cite{mybinder-archive}.
This is very small data and easily archived on free hosting services such as GitHub,
and will be made available under the Creative-Commons Universal Public Domain Dedication (CC0).
There is no cost to the project associated with archiving this data long-term.
\subsubsection{Communication activities}
The main remaining goal for dissemination is making sure that potential users
are aware of the tools and services developed by \TheProject.
In order to maximise this impact, it is vital to address the audience as one project
and ensure the immediate recognition of information stemming from it.
Together with all partners involved, \TheProject will therefore build \textbf{a strong project identity}.
The following design and communication elements will be used to strengthen the project
uniformity and identity and to deliver clear messages to our audience: \TheProject naming, logo,
presentations template, templates for reports and letters, project posters/leaflets etc.
In addition to \TheProject-organised workshops in \taskref{education}{workshops},
the primary mechanism by which we will communicate our results is through publications and conferences.
All publications funded by \TheProject will be \textbf{Open Access},
and sites expecting publications have budgeted funds for paying Open Access fees.
We will identify and attend appropriate conferences for sharing our work,
including running tutorials at conferences in historically interested communities such as PyData and SciPy.
Also, we will identify and attend conferences from complementary communities such as ROpenSci,
Mozilla Science, and Julia
as well as domain specific conferences to maximise the impact of \TheProject and to broaden its
audience outside the
traditionally included communities.
We will operate a \textbf{website} (\taskref{education}{website})
for collecting and sharing information about \TheProject and its progress.
It will provide a centralised way to access the various publicly available deliverables, publications
and articles related
to the project. The site will be regularly updated over the lifetime of the project
with the project publications and public materials, such as flyers, posters and
public deliverables, organized workshops, available services, news, etc.
Site analytics will be associated with the project website, in order
to provide useful insight on how to improve its impact. In addition, the project intends to
develop its presence on \textbf{the social and content
networks}, such as Twitter and Facebook. The channels will be used for interaction
with the professional community as well as the general public
(differentiation on the content per channel based on the target group wishing to address).
As part of the project?s communication plan, \TheProject will develop a social media strategy
in order to increase outreach and social impact, which can be summarised as follows: (a) identifying target
audience and key stakeholders, (b) updating social media content and sparking
discussion in social media/tweeting, (c) measuring social impact and reassessing
social media strategy as required.