-
Notifications
You must be signed in to change notification settings - Fork 7
/
project_work.Rtex
283 lines (210 loc) · 13.1 KB
/
project_work.Rtex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
\documentclass[11pt,a4paper]{article}
\usepackage[latin1]{inputenc}
\usepackage{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\hypersetup{
colorlinks,
linkcolor={red!50!black},
citecolor={blue!50!black},
urlcolor={blue!80!black}
}
% Normal latex T1-fonts
\usepackage[T1]{fontenc}
% Figures
%\usepackage{graphicx,fancybox,subfigure}
% AMS-stuff
\usepackage{amsmath,amsfonts,amssymb}
\usepackage{amsbsy}
% Misc
\usepackage{verbatim}
\usepackage{url}
% Page size
\addtolength{\hoffset}{-1cm}
\addtolength{\textwidth}{2cm}
\addtolength{\voffset}{-1cm}
\addtolength{\textheight}{2cm}
% Paragraph
\setlength{\parindent}{0pt}
\setlength{\parskip}{1ex plus 0.5ex minus 0.2ex}
% Horizontal line
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}}
% No page numbers
\pagestyle{empty}
% New commands:
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\cov}{cov}
\newcommand{\E}{\mathbb{E}}
\newcommand{\Prob}{\mathbb{P}}
\begin{document}
\begin{titlepage}
\center
\textsc{\LARGE Uppsala University}\\[1.5cm] % Name of your university/college
\includegraphics[scale=.2]{files/uu_logo.png}\\[1cm]
\textsc{\Large Bayesian Statistics and Data Analysis}\\[0.5cm]
\textsc{\large }\\[0.5cm]
%\HRule \\[0.4cm]
{ \huge \bfseries Mini-Project Instructions}\\[0.4cm] % Title of your document
%\HRule \\[1.5cm]
\vfill
\end{titlepage}
% General Knitr options
% (this cannot be input since the file runs knitr before LaTeX)
<<echo=FALSE,eval=TRUE>>=
options(continue=" ", prompt="> ")
knitr::opts_chunk$set(size = "small")
@
\tableofcontents
\newpage
\section{Project Instructions}
The last two weeks will focus on a course project where 2-3 students choose data and will do a Bayesian analysis of a real-world dataset.
Requirements for the projects are:
\begin{itemize}
\item Your project should be a Bayesian Data Analysis using Stan (you can use \texttt{brms} or \texttt{rstanarm} if you like. Although, this is easier and will be considered when grading, i.e. it will be harder to get a VG in the mini-project.
\item Real data should be used (see below for details).
\item At least two different models should be estimated and compared.
\end{itemize}
\emph{Tip!} Write the report so you can add your final report as an example of work you have done to future potential employers.
\emph{For PhD students}: You can choose to make a small project related to your research interest instead. Although, it should still be a 4 page paper output.
\subsection{Suggested Reading/Video material}
The project will be a small practical exercise in Bayesian data analysis. To get some inspiration, see the Stan YouTube channel \href{https://www.youtube.com/watch?v=k9sH7x8O0Y8}{[here]} or the Stan User Guide \href{https://mc-stan.org/docs/2_29/stan-users-guide/index.html}{[here]}
\subsection{Project Group and Expected Workload}
It is possible to have only one student in a group, although this is not recommended. One student group will, in practice, mean additional work due to the requirements of the project.
The project is expected to take 40h per student in the group. Hence a 3 group project should be the equivalent of a 120h project.
One student (first author) is the corresponding author. Thats the one that submit the paper on studium.
\subsection{Data Sets and Methods Recommendations}
We recommend that you find a dataset you are interested in using yourself, ideally in a field you find interesting. Feel free to discuss potential projects with the teacher.
If you have a hard time finding a dataset to use, there are a lot of available datasets (and problems) at:
\begin{itemize}
\item The UCI Machine Learning repository: \href{https://archive.ics.uci.edu/ml/index.php}{[here]}
\item The machine learning competition site Kaggle: \href{https://www.kaggle.com/}{[here]}
\end{itemize}
\emph{Hint!} Use a data set with a natural grouping (e.g. patients in hospitals, car emissions by different brands, data with a time dimension etc) since that will help you to use ideas of hiearchical modeling from later in the course.
The following data sets should not be used in the project:
\begin{itemize}
\item Titanic (R data set)
\item mtcars (R data set)
\end{itemize}
Also, two different groups are not allowed to use the same dataset.
\subsection{Project Proposal (Part 1 and Part 2)}
Students need to turn in a one-page project proposal and data description by and get approval for the proposed project. The Project proposal should follow the ICML paper format that can be found \href{https://icml.cc/Conferences/2020/StyleAuthorInstructions}{[here]}.
The ICML format is also available in overleaf here: \href{https://www.overleaf.com/latex/templates/icml-2019-submission-template/vkqjjvzjvhdc}{[here]}. We recommend using Overleaf for writing the proposal.
\emph{The project proposal should include all the group members names!}
In the Project proposal part 1, the following parts should be included. See Project report below on the details on each section.
\begin{enumerate}
\item Title
\item Abstract (1 paragraph)
\item Introduction (roughly 0.5 page)
\item Data (roughly 0.5 page)
\item Models (roughly 0.25 page)
\emph{Note!} For proposal part 1, you only need the most simple model to model the actual data.
\end{enumerate}
For Project Proposal part 2, you need to add the following to your report:
\begin{enumerate}
\item Stan code in the appendix for the first proposed model (from part 1)
\item The first model estimated and added to the Results section of the project proposal.
\end{enumerate}
\subsection{Project Report}
\label{project_report}
The Project outcome is a report in the ICML paper format that can be found \href{https://icml.cc/Conferences/2020/StyleAuthorInstructions}{[here]}.
The ICML format is also available in overleaf here: \href{https://www.overleaf.com/latex/templates/icml-2019-submission-template/vkqjjvzjvhdc}{[here]}. We recommend using Overleaf for writing the report.
The paper should consist of \emph{between three and a half (3.5) and four (4) pages}, excluding references and eventual appendices. Write the report as you would do in a real situation, i.e. do not refer to the paper as a "mini-project" or similar.
\emph{The project report should include all the group members names!}
The paper should include the following parts/sections:
\begin{enumerate}
\item Title
\begin{itemize}
\item The title should describe the problem and be like a real article title, i.e. don't write "Mini-project:" or similar in the title.
\end{itemize}
\item Abstract (1 paragraph)
\item Introduction (roughly 0.5 page)
\emph{Note!} The introduction should be as an introduction to an article, i.e. describe the problem and include at least one reference.
\begin{itemize}
\item Description of the problem.
\end{itemize}
\item Data (roughly 0.5 page)
\begin{itemize}
\item Description of the data, e.g. the number of observations, the number of groups (if you intend to use a hierarchical model), what is the dependent variable etc.
\end{itemize}
\item Models and methods (roughly 0.5 page)
\begin{itemize}
\item Description of the models \emph{Note!} Stan code for all models should be included as an Appendix.
\item Description of how the models were compared (LOO/WAIC)
\end{itemize}
\item Results (roughly 1.5-2 pages)
\begin{itemize}
\item Results of the different models.
\item Estimated parameters of interest together with their posterior uncertainty.
\item Which model does seem to work the best, and why?
\end{itemize}
\item Conclusions (roughly 0.5-1 pages)
\begin{itemize}
\item Conclusions from the results.
\item Discussion of problems and potential improvements and other models
\end{itemize}
\item Acknowledgements (optional)
\begin{itemize}
\item If you are willing to let me use your project report as an example in the next course, please state that in the Acknowledgements. Just add the following sentence: "We hereby grant our consent for the utilization of this project report as a reference material within the context of future editions of the course."
\item Other people you might want to thank.
\end{itemize}
\item Appendix
\begin{itemize}
\item Stan code for all models used in the report should be included in the appendix
\item Other results, Figures etc that you think might be of interest to some readers.
\end{itemize}
\end{enumerate}
\paragraph{Additional requirements for the report}
\begin{enumerate}
\item All Figures using color should have a color-blind friendly color palette. See \href{http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette}{here} and \href{https://rdrr.io/cran/ggthemes/man/colorblind.html}{here}.
\item The final report should look like a research paper/article, i.e. try to avoid bullet list and get a good flow in the text. Also do not refer to the report as a mini-project. See it as a real report (or short paper).
\item All model included in the report should have the Stan code included in the appendix.
\item You should use correct reference systems. A tip is to use \texttt{\\citet}, \texttt{\\citep}, and \texttt{\\bibtex}. This will also simplify your future thesis work.
\item You should include some (or all) estimated parameters together with their uncertainty and $\hat{R}$ and interpret them in the report. You are free to include them in the appendix.
\end{enumerate}
\paragraph{Additional hints for the report}
\begin{enumerate}
\item Before you turn in the project, do a language check with a tool such as Grammarly. A project will poor english (errors that would have been spotted with a tool such as Grammarly) will affect your grade downwards.
\item Remember to standardize the independent variables (see Stan recommendations).
\end{enumerate}
\subsection{Project Presentations}
Presentation details:
\begin{itemize}
\item Each project needs to be presented in addition to submitting the mini-project report
\item The presentation should be high level, but sufficiently detailed information should be readily available to facilitate answering questions from the audience
\item For 1-2 person groups, the presentation should be 10 minutes
\item For three-person groups, the presentation should be 15 minutes
\item Afterwards, questions will be asked first by other students and then by attending teachers.
\end{itemize}
Specific presentation recommendations:
\begin{itemize}
\item The first slide needs to include the project title and names of the group members.
\item The chosen methods(s) should be explained and justified (you are \textit{not} holding this presentation for a hypothetical customer who doesn't care about the details of your methods).
\item Big enough font size for text and figure labels should be used to make it easy for the audience to read slides.
\item A good rule of thumb is to expect one slide to take 2 minutes to present.
\item The last/final slide needs to include your conclusion and names of the group members.
\end{itemize}
\subsubsection*{Missing the project presentation}
Suppose you cannot attend the presentations due to sickness or similar. In that case, you will have to turn in a video presentation where you present the whole project (i.e. the entire presentation). The teacher then grades this presentation.
To be able to get VG on the project the presentation needs to be turned in the day before the presentation at 23.59 the latest.
\subsection{Project Grading}
Below are the criteria used when grading the mini-projects. Some general comments on grading are:
\begin{enumerate}
\item The more students the higher the quality expected of the project, i.e. a better report is expected from a three-student report than a two-student report.
\end{enumerate}
To pass the report (G), the following criterias should be fulfilled:
\begin{enumerate}
\item The report should be turned in and follow the general outline of Section \ref{project_report}.
\item The report should follow the additional requirements in Section \ref{project_report}.
\item show basic knowledge and understanding of the core concepts of the course by using concepts correctly
\item show an understanding in when certain methods should be used or not, and how
\item use at least two (2) \emph{Bayesian} different models and compare them in a correct way (two linear regression models with different covariates is an example of two different models)
\item state what has been done in the report with clarity, good english and rigour so it is easy for a reader to understand and follow the paper.
\item correctly use references in the report following the guideline of the template in Section \ref{project_report}
\end{enumerate}
To pass the mini-project with distinction (VG), the following criterias also apply in addition to the criteria for passing the report above:
\begin{enumerate}
\item show deep knowledge knowledge and understanding of the core concepts and how to adapt them in a good way to a new situation
\item connect the analysis in the report with other areas in statistics or previous courses taken in the masters program, i.e. not just repeat what has been done in previous labs.
\item use models that has not been part of the BSDA course (but for example presented in other courses)
\end{enumerate}
\end{document}