Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Praxis Data Server (https://spineimage.ca) #77

Open
kousu opened this issue May 20, 2021 · 53 comments
Open

Praxis Data Server (https://spineimage.ca) #77

kousu opened this issue May 20, 2021 · 53 comments

Comments

@kousu
Copy link
Member

kousu commented May 20, 2021

https://praxisinstitute.org wants to fund a Canada-wide spine scan sharing platform.

They were considering paying OBI as a vendor, and having them set up a neuroimaging repository. But had doubts about the quality of that solution and have looked around for others, and have landed on asking us for help.

We've proposed a federated data sharing plan and they are interested in pursuing this line.

Needs

  • publication (marketting, accessibility): datasets need to be obvious enough and easy enough to get at that they get used
  • curation: datasets need to be kept in a high quality state, with good metadata and artifacts corrected
    • this means specifying formats (like BIDS, nifti, etc)
    • having scripts to automatically check these formats like we do in spine-generic
    • specifying a checklist of manual curation steps
    • training a curator at each participating hospital/site
  • data protection: datasets must have effective ACLs attached; the ACLs should implement the data sharing and consent agreements that each source study gets
    • this is one of the motivations for making a federated system: each jurisdiction operates under slightly different data protection laws and making a single site for everyone is legally fraught
  • uploading: data needs to get from the scanners into the data archive
    • this is, from our source on the inside, the single most expensive and difficult part. Most hospitals are running a proprietary PACS system where they store their images; extracting the images from that to a regular computer is often extremely tedious, manual, and expensive.
    • so, part of the project is to specify a format (as above) and aid each site in writing scripts to get data from their PACS system into that format and uploaded
  • versioning: data should be versioned, so that work can be reproduced
  • mirroring: data should be easily backed up from one site to another
@kousu
Copy link
Member Author

kousu commented May 20, 2021

Software Design

Global Architecture

                             bidirectional mirroring
                                           │
                                           │                           ┌───────────────┐
                                      ┌────┼──────────────────────────▲│               │
                                      │┼───────────────────────────────┤ data.site4.ca │
                                      ││                               │               │
                                      ││                               ├────────┬────┬─┘
                                      ││                         ┌─────┴────────┘    │
                                      ││                         │                   │
                                      ││                         │                   │
                      ┌───────────────┘▼┐                        │                   │
                      │                 │                        │                   │
                      │ images.site5.ca │                        │                   │
                      │                 │                        │                   │
                      └─────────────────┴─┐                      │                   │
                                          │                      │                   ├──────mirroring
                                          │                      ├────notifying      │
                                          │                      │                   │
                                          │                      │                   │
                                          │                      │                   │
                                       xxx│xxxxxxxxxxxxxxxxxxxxxx▼xxxxxx             │
                                       x┌─▼───────────────────────────┐x             ▼
   ┌─────────────────┐                 x│                             │x         ┌─────────────────┐
   │                 │                 x│   data.praxisinstitute.org   │x         │imagerie.site7.ca│
   │ spines.site6.ca ├─────────────────►│                             │◄─────────┴─┬───────────────┘
   │                 │                 x└─────────────────────────────┘x           │
   └──────────┬────┬─┘                 xxxx▲xxxxxxx▲xxxxxxxxxxxxxxx▲xxxx           │
              │    │                       │       │               │               │
              │    │                       │       │               │               │
              │    │                       │       │               │               │      ┌───────────────┐
              │    │                       │       │               ├───────────────┼──────┘ data.site3.ca │
              │    └───────────────────────┼───────┼───────────────┴───────────────┼──────▼─┬─────────────┘
              ▼                            │       │                               │        │
┌────────────────┐                         │       │                               │        │
│                ├─────────────────────────┘       │                               │        │
│ data.site1.ca  │                                 │                               │        │
│                │                                 │                               │        │
└─────────────┬─▲┘                                 │                               │        │
              │ │                                  │                               │        │
              │ │                                  │                               │        │
              │ │                     ┌────────────┴┐                              │        │
              │ └─────────────────────┤             │                              │        │
              │                       │data.site2.ca◄──────────────────────────────┼────────┘
              └──────────────────────►│             │                              │
                                      └─────────────┘◄─────────────────────────────┘

(edit diagram)

Per-Site Architecture

┌──────────────────┐            ┌──────────────────┐           ┌───────────────────┐           ┌───────────────────────────┐
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│       PACS       ├─────┬──────►       BIDS       ├─────┬────►│  data.example.ca  ├─────┬────►│  data.praxisinstitute.ca  │
│                  │     │      │                  │     │     │                   │     │     │                           │
│                  │     │      │                  │     │     │   (data server)   │     │     │         (portal)          │
│                  │     │      │                  │     │     │                   │     │     │                           │
│                  │     │      │                  │     │     │                   │     │     │                           │
└──────────────────┘     │      └──────────────────┘     │     └───────────────────┘     │     └───────────────────────────┘
                         │                               │                               │
                         │                               │                               │
                         │                               │                               │
                         │                               │                               │
                   export scripts                     uploader                        notifier
                written by each site.               written by us                   written by us
                                             (`git push`, `rsync -a`, etc)      (cronjob: curl data.praxisinstitute.ca/notify ...)

(edit diagram)

Components

It's easy to spend a lot of money writing software from scratch. I don't think we should do that. I think what we should do is design some customizations of existing software and build packages that deploy them.

Data Servers

I have two options in mind for the data servers:

  • GIN - this is a datalad server
  • NextCloud - this is a WebDAV (and more) server

GIN is itself built on top of pre-existing open source projects: Gogs, git-annex, datalad, git, combined in a customized package. We would take it and further customize it. It is a little more sprawling than NextCloud. Being focused on neuroscience, we could easily upstream customizations we design back to them to help out the broader community.

NextCloud is a lot simpler to use than datalad. You can mount it onto Windows, Linux, and macOS as a cloud disk (via WebDAV). It also has a strong company behind it, lots of users, good apps. It's meant for more general use than science; actually, it was never designed for science. It would be harder to share any improvements we make to it; though we could publish our packages and any plugins back to the wider NextCloud ecosystem. It has some other weaknesses too.

Uploading/Downloading

With GIN, uploading can be done with git:

  1. Follow https://github.com/neuropoly/data-management/blob/master/git-annex.md#new-repo, or something like it (these are our instructions; different sites might disagree; this is one of the ways GIN/datalad is a bit more ad-hoc and difficult)
  2. Do this:
    git remote add origin [email protected]:my-new-dataset.git
    git push -u origin master # or, actually, maybe GIN forces you to first make the remote repo in its UI? Unsure
    git annex sync --content
    

Downloading is replacing the first two lines with git clone.

Windows and macOS do not have a git client built in.

With NextCloud, uploading can be done with davfs2 + rsync:

mount -t davfs2 data.site1.ca /mnt/data  # something close to this anyway
rsync -av my-new-dataset /mnt/data/

Downloading is just reversing the arguments to rsync.

There's also cadaver, and Windows and macOS have WebDAV built in.

Versioning

GIN is based on git, so it has very strong versioning.

There are git fsck and git annex fsck to validate that what's on-disk is as expected.

NextCloud only supports weak versioning.

But maybe we can write a plugin that improves that. Somehow. We would have to figure out a way to mount an old version of a dataset.

Permissions

NextCloud has federated ACLs built in: users on data.site1.ca can grant chosen users on spines.site6.ca access to specific folders A/, B/ and D/.

I am unsure what GIN has; since it's based on Gogs, it probably has public/private/protected datasets, all the same controls that Github and Gitlab implement, but I don't think it supports federated ACLs. Federation with GIN might look like everyone having to have one account per site.

But maybe we could improve that; perhaps we could patch GIN to supported federated ACLs as well. We would need to study how NextCloud does it, how GIN does it, and see where we can fit them in.

Sharing

In a federated model, data-sharing is done bidirectionally: humans at each site grant each other access to data sets, one at a time.

We should encourage the actual data sharing to happen via mirroring, for the sake of encouraging resiliancy in the network.

I think we should encourage the

Gitlab supports mirroring; on https://gitlab.com/$user/$repo/-/settings/repository you will find the mirror settings

Gitlab Example

2021-05-19-223217_792x822_scrot

We need to replicate this kind of UI in whatever we deploy.

Portal

For the portal, we can probably write most of it using a static site generator like hugo, plus a small bit of code to add (and remove) datasets.

The dataset list can be generated either by pushing or pulling: the data servers could notify the portal (this is how I've drawn the diagrams above) or the portal could connect to the data servers in a cronjob to ask them what datasets they have. Generally, the latter is more stable, but the former is more accurate.

It should be possible to keep the list of datasets, one per file, in a folder on the portal site, and have hugo automatically read them all and produce a big, cross-referenced, searchable index.

Packaging

We should provide an automated installer for each site to deploy the software parts. It should be as automated as possible: no more than 5 manual install steps.

We can build the packages in:

  • ansible
  • conda
  • custom bash install scripts
  • .deb

I think .deb is the smoothest option here; I have some experience using pacur, and we can use their package server to host the .debs. Whatever we pick, the important thing is that we deliver the software reproducibly and with as little manual customization as possible.

I think we should produce two packages:

  • praxis-data-server - the per-site software
  • praxis-index-server - the portal site

We might also need to produce uploader scripts; but because praxis-data-server will use standard upload protocols it's not as important to do so; moreover, because the uploader will be run directly by users, it will need to deal with cross-platform issues which makes it harder to package. I think, at least as a first draft, we should just document what command lines will get your data uploaded

Software Development Tasks

  • write specification for what a proper dataset looks like; probably BIDS, maybe with some extra restrictions

  • deploy data server prototypes

  • write data server customizations

  • write uploader scripts?

  • write PACS -> BIDS scripts that conform to that specification

    • This is an ad-hoc, per site task. We can help, but we can't write these scripts ourselves.
  • write portal site

  • write data server -> portal notifier

    • sender
    • receiver
  • write packages

    • data server
    • portal server
  • write uploading documentation

  • write downloading documentation

  • write ACL documentation

Open Questions

  • is datalad (the GIN client) compatible with Windows? Is it compatible with Windows without using WSL?
  • Is there a way to build software support for attaching data sharing agreements to the permissions?
    • I'm envisioning a situation where, next to a mirror of a dataset, its data sharing agreement .pdf is linked; perhaps even some of the terms, like expiry dates, can be programmed in.
  • NextCloud or GIN? Or something else?
  • Should we focus on trying to develop federated ACLs for GIN, or

Summary

We can build a federated data system on either GIN or nextcloud. Either one will require some software development to tune it for our use case, but much less than writing a system from scratch. Both support widely supported network protocols which makes them cross-compatible, reliable, and avoids the cost of developing custom clients.

@kousu
Copy link
Member Author

kousu commented May 20, 2021

Cost Estimates

  • 2 devs * 1 month to produce the packages: ~10k-20k $
  • N*5k$ - per-site server hardware
  • N*5k$ - per-site sysadmin time
  • N*40k$ - per-site data curator time

Which works out to about 18 or 19 sites that Praxis can fund this year.

@kousu
Copy link
Member Author

kousu commented May 20, 2021

I'm going to make some demos to make this more concrete.

I'm starting with NextCloud. I'm going to deploy 3 NextClouds and configure them with federation sharing.

Hardware

The first thing to do is to get some hardware. Vultr.com has cheap VPSes. I bought three of them in Canada (the screenshot covered my datacenter selection; trust me, it's in Canada):

Screenshot_2021-05-20 Deploy Servers

Notice I'm choosing Debian, but any Linux option would work.

Just gotta wait for them to deploy....

2021-05-20-135220_1252x273_scrot

And they're up:

2021-05-20-135306_1265x225_scrot

Networking

The second thing to do is set up DNS so these are live net-wide servers.

I went over to my personal DNS server and added

2021-05-20-135612_756x571_scrot
2021-05-20-135815_753x574_scrot

2021-05-20-135702_756x567_scrot

Now just gotta wait for that to deploy...

and they're up too:

[kousu@requiem ~]$ dig data1.praxis.kousu.ca
216.128.176.232
[kousu@requiem ~]$ dig data2.praxis.kousu.ca
216.128.179.150
[kousu@requiem ~]$ dig data3.praxis.kousu.ca
149.248.50.100

Sysadmin Access Control

I just need to make sure I have access secured. I'm going to do two things:

I go to the VPS settings one at a time and grab the root passwords:

2021-05-20-140156_384x187_scrot

then I log in, and confirm the system looks about right:

[kousu@requiem ~]$ ssh [email protected]
[email protected]'s password: 
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@data1:~# ls /home/
root@data1:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Then I use ssh-copy-id to enroll myself:

[kousu@requiem ~]$ ssh-copy-id -i ~/.ssh/id_ed25519 [email protected]
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kousu/.ssh/id_ed25519.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
[email protected]'s password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '[email protected]'"
and check to make sure that only the key(s) you wanted were added.

[kousu@requiem ~]$ ssh [email protected]
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu May 20 17:59:53 2021 from 192.222.158.190
root@data1:~# 

And now that that works, I disable root password login, which is a pretty important security baseline:

root@data1:~# sed -i 's/PermitRootLogin yes/#PermitRootLogin no/' /etc/ssh/sshd_config 
root@data1:~# systemctl restart sshd

In a different terminal without disconnecting, in case we need to do repairs, verify this worked by:

  1. that I can still ssh in using the key

    [kousu@requiem ~]$ ssh [email protected]
    Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64
    
    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.
    
    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.
    Last login: Thu May 20 18:06:42 2021 from 192.222.158.190
    
  2. that, when I tell ssh to only use password auth, it rejects me

    [kousu@requiem ~]$ ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no [email protected]
    [email protected]'s password: 
    Permission denied, please try again.
    

I'm also going to add a sudo account, as a backup:

First, invent a password:

[kousu@requiem ~]$ pass generate -n -c servers/praxis/data1.praxis.kousu.ca

Then make the account:

root@data1:~# sed -i 's|/bin/sh|/bin/bash|' /etc/default/useradd 
root@data1:~# useradd -m kousu
root@data1:~# passwd kousu
New password: 
Retype new password: 
passwd: password updated successfully
root@data1:~# usermod -a -G sudo kousu

Test the account:

[kousu@requiem ~]$ ssh data1.praxis.kousu.ca
[email protected]'s password: 
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu May 20 18:14:38 2021 from 192.222.158.190
$ sudo ls

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for kousu: 
$ groups
kousu sudo
$ 

So this means I have two ways in, the root password is disabled, my own user password is lengthy and secure.

Now repeat the same for data2.praxis.kousu.ca and data3.praxis.kousu.ca.

Basic config

Set system hostname -> already done by Vultr, thanks Vultr

(and repeat for each of the three)

updates

root@data1:~# apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main Sources [185 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [289 kB]
Get:6 http://security.debian.org/debian-security buster/updates/main Translation-en [150 kB]
Fetched 740 kB in 0s (2283 kB/s)                          
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

(and repeat for each of the three)

unattended-upgrades

apt-get install unattended-upgrades

Configure it like I've done for our internal servers: enable regular updates, not just security ones, do updates once a week, enable auto-reboot.

/etc/apt/apt.conf.d/50unattended-upgrades
// Unattended-Upgrade::Origins-Pattern controls which packages are
// upgraded.
//
// Lines below have the format format is "keyword=value,...".  A
// package will be upgraded only if the values in its metadata match
// all the supplied keywords in a line.  (In other words, omitted
// keywords are wild cards.) The keywords originate from the Release
// file, but several aliases are accepted.  The accepted keywords are:
//   a,archive,suite (eg, "stable")
//   c,component     (eg, "main", "contrib", "non-free")
//   l,label         (eg, "Debian", "Debian-Security")
//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")
//   n,codename      (eg, "jessie", "jessie-updates")
//     site          (eg, "http.debian.net")
// The available values on the system are printed by the command
// "apt-cache policy", and can be debugged by running
// "unattended-upgrades -d" and looking at the log file.
//
// Within lines unattended-upgrades allows 2 macros whose values are
// derived from /etc/debian_version:
//   ${distro_id}            Installed origin.
//   ${distro_codename}      Installed codename (eg, "buster")
Unattended-Upgrade::Origins-Pattern {
        // Codename based matching:
        // This will follow the migration of a release through different
        // archives (e.g. from testing to stable and later oldstable).
        // Software will be the latest available for the named release,
        // but the Debian release itself will not be automatically upgraded.
        "origin=Debian,codename=${distro_codename}-updates";
//      "origin=Debian,codename=${distro_codename}-proposed-updates";
        "origin=Debian,codename=${distro_codename},label=Debian";
        "origin=Debian,codename=${distro_codename},label=Debian-Security";

        // Archive or Suite based matching:
        // Note that this will silently match a different release after
        // migration to the specified archive (e.g. testing becomes the
        // new stable).
//      "o=Debian,a=stable";
//      "o=Debian,a=stable-updates";
//      "o=Debian,a=proposed-updates";
//      "o=Debian Backports,a=${distro_codename}-backports,l=Debian Backports";
};

// Python regular expressions, matching packages to exclude from upgrading
Unattended-Upgrade::Package-Blacklist {
    // The following matches all packages starting with linux-
//  "linux-";

    // Use $ to explicitely define the end of a package name. Without
    // the $, "libc6" would match all of them.
//  "libc6$";
//  "libc6-dev$";
//  "libc6-i686$";

    // Special characters need escaping
//  "libstdc\+\+6$";

    // The following matches packages like xen-system-amd64, xen-utils-4.1,
    // xenstore-utils and libxenstore3.0
//  "(lib)?xen(store)?";

    // For more information about Python regular expressions, see
    // https://docs.python.org/3/howto/regex.html
};

// This option allows you to control if on a unclean dpkg exit
// unattended-upgrades will automatically run 
//   dpkg --force-confold --configure -a
// The default is true, to ensure updates keep getting installed
//Unattended-Upgrade::AutoFixInterruptedDpkg "true";

// Split the upgrade into the smallest possible chunks so that
// they can be interrupted with SIGTERM. This makes the upgrade
// a bit slower but it has the benefit that shutdown while a upgrade
// is running is possible (with a small delay)
//Unattended-Upgrade::MinimalSteps "true";

// Install all updates when the machine is shutting down
// instead of doing it in the background while the machine is running.
// This will (obviously) make shutdown slower.
// Unattended-upgrades increases logind's InhibitDelayMaxSec to 30s.
// This allows more time for unattended-upgrades to shut down gracefully
// or even install a few packages in InstallOnShutdown mode, but is still a
// big step back from the 30 minutes allowed for InstallOnShutdown previously.
// Users enabling InstallOnShutdown mode are advised to increase
// InhibitDelayMaxSec even further, possibly to 30 minutes.
//Unattended-Upgrade::InstallOnShutdown "false";

// Send email to this address for problems or packages upgrades
// If empty or unset then no email is sent, make sure that you
// have a working mail setup on your system. A package that provides
// 'mailx' must be installed. E.g. "[email protected]"
Unattended-Upgrade::Mail "root";

// Set this value to "true" to get emails only on errors. Default
// is to always send a mail if Unattended-Upgrade::Mail is set
//Unattended-Upgrade::MailOnlyOnError "false";

// Remove unused automatically installed kernel-related packages
// (kernel images, kernel headers and kernel version locked tools).
//Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";

// Do automatic removal of newly unused dependencies after the upgrade
//Unattended-Upgrade::Remove-New-Unused-Dependencies "true";

// Do automatic removal of unused packages after the upgrade
// (equivalent to apt-get autoremove)
Unattended-Upgrade::Remove-Unused-Dependencies "true";

// Automatically reboot *WITHOUT CONFIRMATION* if
//  the file /var/run/reboot-required is found after the upgrade
Unattended-Upgrade::Automatic-Reboot "true";

// Automatically reboot even if there are users currently logged in
// when Unattended-Upgrade::Automatic-Reboot is set to true
//Unattended-Upgrade::Automatic-Reboot-WithUsers "true";

// If automatic reboot is enabled and needed, reboot at the specific
// time instead of immediately
//  Default: "now"
Unattended-Upgrade::Automatic-Reboot-Time "02:00";

Unattended-Upgrade::Update-Days {5};

// Use apt bandwidth limit feature, this example limits the download
// speed to 70kb/sec
//Acquire::http::Dl-Limit "70";

// Enable logging to syslog. Default is False
// Unattended-Upgrade::SyslogEnable "false";

// Specify syslog facility. Default is daemon
// Unattended-Upgrade::SyslogFacility "daemon";

// Download and install upgrades only on AC power
// (i.e. skip or gracefully stop updates on battery)
// Unattended-Upgrade::OnlyOnACPower "true";

// Download and install upgrades only on non-metered connection
// (i.e. skip or gracefully stop updates on a metered connection)
// Unattended-Upgrade::Skip-Updates-On-Metered-Connections "true";

// Verbose logging
// Unattended-Upgrade::Verbose "false";

// Print debugging information both in unattended-upgrades and
// in unattended-upgrade-shutdown
// Unattended-Upgrade::Debug "false";
/etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::CleanInterval "7";
  1. install mailer
root@data1:~# hostname > /etc/mailname 
root@data1:~# DEBIAN_FRONTEND=noninteractive apt-get install -y opensmtpd
root@data1:~# echo [email protected] >> ~root/.forward

test mailer:

in one terminal:

# journalctl -f -u opensmtpd

With the help of https://www.mail-tester.com/, in another:

root@data3:~# mail -s "testing outgoing" [email protected]
Cc: 
Hi there, will this go through?

opensmtpd logs say:

May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=connected address=local host=data3.praxis.kousu.ca
May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=message address=local host=data3.praxis.kousu.ca msgid=1eb206e1 from=<[email protected]> to=<[email protected]> size=471 ndest=1 proto=ESMTP
May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=closed address=local host=data3.praxis.kousu.ca reason=quit
May 21 01:16:44 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=connecting address=smtp+tls://94.23.206.89:25 host=mail-tester.com
May 21 01:16:44 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=connected
May 21 01:16:45 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=starttls ciphers=version=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256
May 21 01:16:45 data3.praxis.kousu.ca smtpd[5954]: smtp-out: Server certificate verification succeeded on session 84dff16ec08d0171

May 21 01:16:46 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=delivery evpid=1eb206e1eff6a367 from=<[email protected]> to=<[email protected]> rcpt=<-> source="149.248.50.100" relay="94.23.206.89 (mail-tester.com)" delay=3s result="Ok" stat="250 2.0.0 Ok: queued as 3B00EA0237"
May 21 01:16:56 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=closed reason=quit messages=1

and it got to mail-tester.com; but mail-tester scored it low because the DNS needs work:

  • need to fix the reverse DNS at vultr
  • need to add an SPF record
  • need to add an MX record
  1. install, uh, netdata?
  2. letsencrypt
  3. nginx
  4. nextcloud
    7. [ ] configure nextcloud using occ

@kousu
Copy link
Member Author

kousu commented May 23, 2021

mailer needs: sudo ufw allow 25/tcp. This must be an upgrade on Vultr's end since the last time I set up servers.

@kousu
Copy link
Member Author

kousu commented May 29, 2021

la la la

minutes

Alternatives:

  • zfs
    • we're not the only ones to have this thought
    • pros:
      • solid codebase
      • Copy-on-Write log-structured thing
    • cons:
      • probably not good for data sharing; multiple users can't share a dataset? or if they can
      • requires connecting to the host server to set the version, using CLI tools

@jcohenadad
Copy link
Member

probably not good for data sharing; multiple users can't share a dataset? or if they can
requires connecting to the host server to set the version, using CLI tools

I'm not sure I understand that. For example, in the context of NeuroPoly's internal data (which are currently versioned/distributed with git-annex), would it be considered "one user sharing a dataset"? And if so, would ZFS be limited for this specific use-case?

@kousu
Copy link
Member Author

kousu commented May 30, 2021

My (weak) understanding is that with zfs you have to do:

  • git commit ~= sudo zfs snapshot $ZFS_ROOT/$VERSION
  • git checkout $VERSION ~= sudo mount -t zfs $ZFS_ROOT/$VERSION $PATH

So actually, yes, a single zfs instance can be shared with multiple users, so long as everyone has a) direct ssh access to the data server b) sudo rights on that data server.

Alternately, an admin (i.e. you or me or Alex) could ssh in, mount a snapshot, and expose it more safely to users over afp://, smb://, nfs://, sshfs://. But then users need to be constantly coordinating with their sysadmins. Maybe that's okay for slow-moving datasets like the envisioned Praxis system but it would be pretty awkward for daily use here.

@kousu
Copy link
Member Author

kousu commented May 30, 2021

Although there is this: on Linux, you can do 'checkout' without mounting like this:

# cd $PATH/.zfs/snapshot/$VERSION

You still need sudo to make a commit; probably to make any change at all. But I guess we could make this work if the goal is just reproducibility?

@jcohenadad
Copy link
Member

jcohenadad commented Jun 25, 2021

It looks like we are going towards a centralized solution. In brief:

  • each participating site will curate their data into BIDS and send anonymized data to the centralized server
  • a mechanism will need to be put in place for participating sites to push data (and centralized server to review what is pushed).
  • a web interface (eg like a portal) will display the available datasets with instructions for download
  • a mechanism is needed to only allow people with certain permissions to download the data (eg TOKEN distributed alongside IRB/NDA approval?)
  • centralized server will run cron job to ensure BIDS compliance

Question is: where to host this centralized server

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

It's Demo day:

We have been allocated an account on https://arbutus.cloud.computecanada.ca/. Docs at https://docs.computecanada.ca/wiki/Cloud_Quick_Start.

I'm going to install GIN on it: https://gin.g-node.org/G-Node/Info/wiki/In+House

Let's see how fast I can do this.

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

  1. auth: log in with username/pass at https://arbutus.cloud.computecanada.ca/
  2. auth: xclip -selection clipboard ~/.ssh/id_ed25519.neuropoly.pub and paste into https://arbutus.cloud.computecanada.ca/project/key_pairs -> Import Public Key; repeat for @jcohenadad's key from ansible
  3. Allocate VM: https://arbutus.cloud.computecanada.ca/project/instances/ -> " Launch Instance" ->
    • Name = praxis-gin
    • Availability Zone = persistent_01
    • Source = Ubuntu-20.04.2-Focal-minimal-x64-2021-05, flavor = p1-1.5gb
    • everything else at defaults
    • "Launch Instance"
  4. Networking: https://arbutus.cloud.computecanada.ca/project/instances/ -> praxis-gin -> Actions -> Associate Floating IP
    • + -> Manage Floating IP
      • DNS Domain = praxis.kousu.ca, DNS Name = data1.praxis.kousu.ca (these are not long term names; just repurposing my demo)
      • Allocate IP
      • "Error: Unable to allocate Floating IP. " :/
      • DNS Domain = "", DNS Name = ""
      • Allocate IP
      • 206.12.93.20 is allocated
    • "Associate"

2021-07-07-121828_1672x153_scrot\

  1. Test connection
[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly [email protected]
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:Z82G+UO/D3ZRJV53eQeaVt2rWSaVFmhLcEwbHO519Ig.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey).
[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly [email protected]
[email protected]: Permission denied (publickey).

hm okay what's wrong?

Okay docs say I should be using the username "ubuntu". That doesn't work either.

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

It seems like it just hung? I deleted and remade the instance with

Name = praxx
Availability Zone = any (the docs said I shouldn't have changed this)
Source = Ubuntu-20.04.2-Focal-x64-2021-05
flavor = p2-3gb
everything else at defaults

I still can't get in though:

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly [email protected]
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:nAE3NfUZ1R6uSdr3GUeuJPJ1gENAQdexM29r0EM8vxs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey).
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa [email protected]
id_rsa         id_rsa.github  
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa [email protected]
[email protected]: Permission denied (publickey).

Oh I see what the problem is:

Paste your public key (only RSA type SSH keys are currently supported).

drat.

But I...added my rsa key? And it's still not working?

[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa [email protected]
[email protected]: Permission denied (publickey).

hm.

The system log (which openstack will show you) says

[ 43.704728] cloud-init[1249]: ci-info: no authorized SSH keys fingerprints found for user ubuntu.

so, hm. Why?

Oh I missed this step:

Key Pair: From the Available list, select the SSH key pair you created earlier by clicking the upwards arrow on the far right of its row. If you do not have a key pair, you can create or import one from this window using the buttons at the top of the window (please see above). For more detailed information on managing and using key pairs see SSH Keys.

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Delete and recreate with

Name = praxis-gin
Availability Zone = any (the docs said I shouldn't have changed this)
Source = Ubuntu-20.04.2-Focal-x64-2021-05
flavor = p2-3gb
keypair = nguenthe-requiem-rsa

It only allows you to init with a single keypair! Ah.

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Got in:

[kousu@requiem ~]$ ssh-keygen -R 206.12.93.20
# Host 206.12.93.20 found: line 119
/home/kousu/.ssh/known_hosts updated.
Original contents retained as /home/kousu/.ssh/known_hosts.old
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa [email protected]
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:qJO/JofxCKeaGD71R5fxkGYlPBFAjfPOOPeeiWByqUc.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
Enter passphrase for key '/home/kousu/.ssh/id_rsa': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:42:01 UTC 2021

  System load:  0.3               Processes:             123
  Usage of /:   6.5% of 19.21GB   Users logged in:       0
  Memory usage: 6%                IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%

1 update can be applied immediately.
To see these additional updates run: apt list --upgradable


The list of available updates is more than a week old.
To check for new updates run: sudo apt update


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@praxis-gin:~$ sudo ls
ubuntu@praxis-gin:~$ 

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

  1. basic system updates:
    ubuntu@praxis-gin:~$ sudo apt-get update && DEBIAN_FRONTEND=noninteractive sudo apt-get dist-upgrade -y
    
  2. auth (again):
[kousu@requiem ~]$ ssh root@joplin -- cat '~/.ssh/authorized_keys' | ssh [email protected] -- sudo tee -a '/root
[kousu@requiem ~]$ ssh root@joplin -- cat '~/.ssh/authorized_keys' | ssh [email protected] -- tee -a '~/.ssh/authorized_keys'

test: root@

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly [email protected]
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:49:02 UTC 2021

  System load:  0.05              Processes:             128
  Usage of /:   9.2% of 19.21GB   Users logged in:       1
  Memory usage: 10%               IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%


0 updates can be applied immediately.


*** System restart required ***
Last login: Wed Jul  7 16:48:06 2021 from 104.163.172.27
root@praxis-gin:~# logout
Connection to 206.12.93.20 closed.

ubuntu@

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly [email protected]
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:49:08 UTC 2021

  System load:  0.04              Processes:             127
  Usage of /:   9.2% of 19.21GB   Users logged in:       1
  Memory usage: 10%               IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%


0 updates can be applied immediately.


*** System restart required ***
Last login: Wed Jul  7 16:42:04 2021 from 104.163.172.27
  1. finish updates: ubuntu@praxis-gin:~$ sudo reboot

  2. Log back in

  3. Install docker, since that's how GIN is packaged:

ubuntu@praxis-gin:~$ sudo apt-get install docker.io
ubuntu@praxis-gin:~$ sudo systemctl enable --now docker
ubuntu@praxis-gin:~$ sudo usermod -a -G docker ubuntu # grant rights
ubuntu@praxis-gin:~$ logout
Connection to 206.12.93.20 closed.

Test:

[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa [email protected]
Enter passphrase for key '/home/kousu/.ssh/id_rsa': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-77-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:53:32 UTC 2021

  System load:  0.67               Processes:                119
  Usage of /:   11.2% of 19.21GB   Users logged in:          0
  Memory usage: 9%                 IPv4 address for docker0: 172.17.0.1
  Swap usage:   0%                 IPv4 address for ens3:    192.168.233.67


0 updates can be applied immediately.


Last login: Wed Jul  7 16:50:59 2021 from 104.163.172.27
ubuntu@praxis-gin:~$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Start following https://gin.g-node.org/G-Node/Info/wiki/In+House:

  1. Install ubuntu@praxis-gin:~$ docker pull gnode/gin-web:live

  2. firewall again: GIN wants port 3000 and 2222 open, so: https://arbutus.cloud.computecanada.ca/project/security_groups -> Manage -> Add rules for 3000 and 2222 ingress
    2021-07-07-125804_1663x100_scrot

  3. Run it: ubuntu@praxis-gin:~$ docker run -p 3000:3000 -p 2222:22 -d gnode/gin-web:live (NOTE: small bug in the instructions: they tell you to install the :live version but then to run the `` version which in docker implies :latest).

  4. Test it seems to be up:

the ports are listening:

ubuntu@praxis-gin:~$ sudo netstat -nlpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      3520/docker-proxy   
tcp        0      0 127.0.0.1:44435         0.0.0.0:*               LISTEN      1376/containerd     
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      536/systemd-resolve 
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      611/sshd: /usr/sbin 
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN      3507/docker-proxy   
tcp6       0      0 :::22                   :::*                    LISTEN      611/sshd: /usr/sbin 
[kousu@requiem ~]$ ssh -p 2222 206.12.93.20 
The authenticity of host '[206.12.93.20]:2222 ([206.12.93.20]:2222)' can't be established.
ED25519 key fingerprint is SHA256:41ELnYTqwKKUzA9zMFSopXmi953gc+ZGco9f4vqvF3g.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[206.12.93.20]:2222' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey,keyboard-interactive).

(I don't have a key inside of GIN yet, so of course this fails, but it's listening)

  1. Insecurely run the setup process by visiting http://206.12.93.20:3000:

Screenshot 2021-07-07 at 13-01-48 Gogs

I filled out the options like this:

Screenshot 2021-07-07 at 13-11-03 Gogs

  1. Give it a DNS name by logging into to my personal DNS server (I don't have rights to dns://neuro.polymtl.ca) and mapping A data1.praxis.kousu.ca -> 206.12.93.20.

Verify:

[kousu@requiem ~]$ ping data1.praxis.kousu.ca
PING data1.praxis.kousu.ca (206.12.93.20) 56(84) bytes of data.
64 bytes from 206-12-93-20.cloud.computecanada.ca (206.12.93.20): icmp_seq=1 ttl=43 time=86.3 ms
64 bytes from 206-12-93-20.cloud.computecanada.ca (206.12.93.20): icmp_seq=2 ttl=43 time=78.6 ms
^C
--- data1.praxis.kousu.ca ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 78.619/82.471/86.324/3.852 ms

Change the hostname to match:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname
praxis-gin
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo vi /etc/hostname 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat /etc/hostname 
data1.praxis.kousu.ca
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo hostname $(cat /etc/hostname)
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname $(cat /etc/hostname)
hostname: you must be root to change the host name
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname
data1.praxis.kousu.ca

7.[ ] TLS
Okay TLS is always a hoot. Let's see if I can do this in 10 minutes eh?
I can front Gogs with an nginx reverse proxy

Actually I have this already, I can just copy the config out of https://github.com/neuropoly/computers/tree/master/ansible/roles/neuropoly-tls-server

ubuntu@praxis-gin:~$ sudo apt-get install nginx dehydrated

nginx config:

ubuntu@praxis-gin:~$ sudo vi /etc/nginx/sites-available/acme
ubuntu@praxis-gin:~$ cat /etc/nginx/sites-available/acme

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}
ubuntu@praxis-gin:~$ sudo vi /etc/nginx/snippets/_ssl.conf
ubuntu@praxis-gin:~$ cat /etc/nginx/snippets/_ssl.conf
ssl_certificate /etc/ssl/acme/data1.praxis.kousu.ca/fullchain.pem;
ssl_certificate_key /etc/ssl/acme/data1.praxis.kousu.ca/privkey.pem;

gzip off; # anti-BREACH: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773332

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers "HIGH:!aNULL"; # OpenBSD's recommendation: https://man.openbsd.org/httpd.conf
ssl_prefer_server_ciphers on;
ubuntu@praxis-gin:~$ cd /etc/nginx/sites-enabled/
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo ln -s ../sites-available/acme 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo rm default 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ ls -l 
total 0
lrwxrwxrwx 1 root root 23 Jul  7 17:32 acme -> ../sites-available/acme

dehydrated config:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname | sudo tee /etc/dehydrated/domains.txt
data1.praxis.kousu.ca
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat /etc/dehydrated/conf.d/neuropoly.sh
AUTO_CLEANUP=yes

# TODO: set this to the sysadmin mailing list: https://github.com/neuropoly/computers/issues/39
[email protected]

CERTDIR=/etc/ssl/acme

# it would be nice to use the default more efficient ECDSA keys
#KEY_ALGO=secp384r1
# but netdata is incompatible with them
KEY_ALGO=rsa
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat acme 

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat ../snippets/_ssl.conf 
ssl_certificate /etc/ssl/acme/data1.praxis.kousu.ca/fullchain.pem;
ssl_certificate_key /etc/ssl/acme/data1.praxis.kousu.ca/privkey.pem;

gzip off; # anti-BREACH: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773332

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers "HIGH:!aNULL"; # OpenBSD's recommendation: https://man.openbsd.org/httpd.conf
ssl_prefer_server_ciphers on;
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo systemctl restart nginx

Verify:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ curl -v https://data1.praxis.kousu.ca
*   Trying 206.12.93.20:443...
* TCP_NODELAY set
* Connected to data1.praxis.kousu.ca (206.12.93.20) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=data1.praxis.kousu.ca
*  start date: Jul  7 16:40:47 2021 GMT
*  expire date: Oct  5 16:40:46 2021 GMT
*  subjectAltName: host "data1.praxis.kousu.ca" matched cert's "data1.praxis.kousu.ca"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host: data1.praxis.kousu.ca
> User-Agent: curl/7.68.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.18.0 (Ubuntu)
< Date: Wed, 07 Jul 2021 17:43:21 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
< Connection: keep-alive
< ETag: "5e9efe7d-264"
< Accept-Ranges: bytes
< 
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host data1.praxis.kousu.ca left intact

One thing not in ansible is the gogs reverse proxy part:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat gogs 
server {

    server_name _;


    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";

    location / {
            proxy_set_header X-Real-IP $remote_addr;
            proxy_pass http://127.0.0.1:3000/;
    }
}
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat acme 

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    #listen 443 ssl;
    #listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}

NOTE: I disabled ssl in '/etc/nginx/sites-enabled/acme' because it was conflicting with gogs?? I don't know what's up with that. Gotta think through that more. Maybe ansible needs another patch.

It's working now:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ curl -v https://data1.praxis.kousu.ca
*   Trying 206.12.93.20:443...
* TCP_NODELAY set
* Connected to data1.praxis.kousu.ca (206.12.93.20) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=data1.praxis.kousu.ca
*  start date: Jul  7 16:40:47 2021 GMT
*  expire date: Oct  5 16:40:46 2021 GMT
*  subjectAltName: host "data1.praxis.kousu.ca" matched cert's "data1.praxis.kousu.ca"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host: data1.praxis.kousu.ca
> User-Agent: curl/7.68.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.18.0 (Ubuntu)
< Date: Wed, 07 Jul 2021 17:46:44 GMT
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647
< Set-Cookie: i_like_gogs=4956b07a72ea926c; Path=/; HttpOnly
< Set-Cookie: _csrf=hPHf3Z_mpseJYpknbvtGHoPMh506MTYyNTY4MDAwNDI4Mjk0OTAzMQ; Path=/; Expires=Thu, 08 Jul 2021 17:46:44 GMT
< 
<!DOCTYPE html>
<html>
<head data-suburl="">
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
	<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
	
		<meta name="author" content="G-Node"/>
		<meta name="description" content="GIN is a data repository for science"/>
		<meta name="keywords" content="gin, data, sharing, science git">
	
	<meta name="referrer" content="no-referrer" />
	<meta name="_csrf" content="hPHf3Z_mpseJYpknbvtGHoPMh506MTYyNTY4MDAwNDI4Mjk0OTAzMQ" />
	<meta name="_suburl" content="" />
	
	
	
		<meta property="og:url" content="http://206.12.93.20:3000/" />
		<meta property="og:type" content="website" />
		<meta property="og:title" content="Gogs">
		<meta property="og:description" content="GIN is a data repository for science">
		<meta property="og:image" content="http://206.12.93.20:3000/img/gogs-lg.png" />
		<meta property="og:site_name" content="GIN">
	

	<link rel="shortcut icon" href="/img/favicon.png" />

	<script src="/js/jquery-1.11.3.min.js"></script>
	<script src="/js/libs/jquery.are-you-sure.js"></script>
	<link rel="stylesheet" href="/assets/font-awesome-4.6.3/css/font-awesome.min.css">
	<link rel="stylesheet" href="/assets/octicons-4.3.0/octicons.min.css">

	
	

	

	
	<link rel="stylesheet" href="/css/semantic-2.3.1.min.css">
	<link rel="stylesheet" href="/css/gogs.css?v=0b67f1414e0be5c3bab3905fe172c80e">
	<noscript>
		<style>
			.dropdown:hover > .menu { display: block; }
			.ui.secondary.menu .dropdown.item > .menu { margin-top: 0; }
		 </style>
	</noscript>

	
	<script src="/js/semantic-2.3.1.min.js"></script>
	<script src="/js/gogs.js?v=0b67f1414e0be5c3bab3905fe172c80e"></script>

	<title>Gogs</title>

	<meta name="theme-color" content="#ff5343">

	
<link rel="stylesheet" href="/css/custom.css">


	<meta name="robots" content="nofollow"/>



<link href='https://fonts.googleapis.com/css?family=Kaushan+Script' rel='stylesheet' type='text/css'>


<meta name="twitter:card" content="summary" />
<meta name="twitter:site" content="@gnode" />
<meta name="twitter:title" content="GIN" />
<meta name="twitter:description" content="Modern Research Data Management for Neuroscience"/>
<meta name="twitter:image" content="https://web.gin.g-node.org/img/favicon.png" />









	<script src="/js/libs/js.cookie.js"></script>
	<div class="ui inline cookie nag">
		<span class="title">We use cookies to ensure you get the best experience on our website</span>
		<i class="nag close icon"></i>
	</div>


</head>
<body>
	<div class="full height">
		<noscript>This website works better with JavaScript</noscript>

		
			<div class="following bar light">
				<div class="ui container">
					<div class="ui grid">
						<div class="column">


							<div class="ui top secondary menu">
								<a class="item brand" href="/">
									<img class="ui mini image" src="/img/favicon.png">
								</a>

								
									<a class="item active" href="/">Home</a>
								

								<a class="item" href="/explore/repos"><i class="octicon octicon-search"></i>Explore</a>
								

								<a class="item" href="/G-Node/info/wiki" rel="noreferrer"><i class="octicon octicon-question"></i>Help</a>
								<a class="item" href="/G-Node/Info/wiki/News">News</a>
								

									<div class="right menu">
										
											<a class="item" href="/user/sign_up">
												<i class="octicon octicon-person"></i> Register
											</a>
										
										<a class="item" href="/user/login?redirect_to=">
											<i class="octicon octicon-sign-in"></i> Sign In
										</a>
									</div>

								
							</div>
						</div>
					</div>
				</div>
			</div>
		

		


<div class="home">
	<div class="ui stackable middle very relaxed page grid">
		<div class="sixteen wide center aligned centered column">
			<div class="logo">
				<img src="/img/favicon.png" />
			</div>
			<div class="hero">
				<h1 class="ui icon header title">
					GIN
				</h1>
				<h2>Modern Research Data Management for Neuroscience</h2>
				<div class="ginsubtitle">...distributed version control, flavoured for science</div>
			</div>
		</div>
	</div>

	<div class="ui stackable middle very relaxed page grid">
		<div class="eight wide center column">
			<h1 class="hero ui icon header">
				<i class="octicon octicon-device-desktop"></i> Manage your research data
			</h1>
			<p class="large">
				Upload your data to private repositories.<br>
				Synchronise across devices.<br>
				Securely access your data from anywhere.
			</p>
		</div>
		<div class="eight wide center column">
			<h1 class="hero ui icon header">
				<i class="octicon octicon-mortar-board"></i> Share your data
			</h1>
			<p class="large">
				Collaborate with colleagues.<br>
				Make your data public.<br>
				Make your data citable with the GIN DOI service.
			</p>
		</div>
	</div>
	<div class="ui stackable middle very relaxed page grid">
		<div class="eight wide center column">
			<h1 class="hero ui icon header">
				<i class="octicon octicon-list-ordered"></i> Version your data
			</h1>
			<p class="large">
				Uploaded files are automatically versioned.<br>
				Retrieve any previously uploaded version of a file.<br>
				Never lose file history.
			</p>
		</div>
		<div class="eight wide center column">
			<h1 class="hero ui icon header">
				<i class="octicon octicon-code"></i> Open Source
			</h1>
			<p class="large">
				Based on open source projects such as <a href="https://git-scm.com/">Git</a>,
				<a href="https://git-annex.branchable.com/">git-annex</a>, and <a href="https://github.com/gogits/gogs">Gogs</a>.
				You can even set it up in your lab!
			</p>
		</div>
		<div class="sixteen wide center aligned centered column">
			<h1 class="hero ui icon header">
				<a href="/G-Node/Info/wiki">I want to know more!</a>
			</h1>
		</div>
	</div>
</div>
	
	</div>
	<footer>
		<div class="ui container">
			<div class="ui center links item brand footertext">
				<a href="http://www.g-node.org"><img class="ui mini footericon" src="https://projects.g-node.org/assets/gnode-bootstrap-theme/1.2.0-snapshot/img/gnode-icon-50x50-transparent.png"/>© 2016-2021 G-Node</a>
				<a href="/G-Node/Info/wiki/about">About</a>
				<a href="/G-Node/Info/wiki/imprint">Imprint</a>
				<a href="/G-Node/Info/wiki/contact">Contact</a>
				<a href="/G-Node/Info/wiki/Terms+of+Use">Terms of Use</a>
				<a href="/G-Node/Info/wiki/Datenschutz">Datenschutz</a>
				
				<div class="ui language bottom floating slide up dropdown link item" data-tooltip="Non-English translations may be incomplete">
					<i class="world icon"></i>
					<div class="text">English</div>
					<div class="menu">
						
							<a class="item active selected" href="#">English</a>
						
							<a class="item " href="?lang=zh-CN">简体中文</a>
						
							<a class="item " href="?lang=zh-HK">繁體中文(香港)</a>
						
							<a class="item " href="?lang=zh-TW">繁體中文(臺灣)</a>
						
							<a class="item " href="?lang=de-DE">Deutsch</a>
						
							<a class="item " href="?lang=fr-FR">français</a>
						
							<a class="item " href="?lang=nl-NL">Nederlands</a>
						
							<a class="item " href="?lang=lv-LV">latviešu</a>
						
							<a class="item " href="?lang=ru-RU">русский</a>
						
							<a class="item " href="?lang=ja-JP">日本語</a>
						
							<a class="item " href="?lang=es-ES">español</a>
						
							<a class="item " href="?lang=pt-BR">português do Brasil</a>
						
							<a class="item " href="?lang=pl-PL">polski</a>
						
							<a class="item " href="?lang=bg-BG">български</a>
						
							<a class="item " href="?lang=it-IT">italiano</a>
						
							<a class="item " href="?lang=fi-FI">suomi</a>
						
							<a class="item " href="?lang=tr-TR">Türkçe</a>
						
							<a class="item " href="?lang=cs-CZ">čeština</a>
						
							<a class="item " href="?lang=sr-SP">српски</a>
						
							<a class="item " href="?lang=sv-SE">svenska</a>
						
							<a class="item " href="?lang=ko-KR">한국어</a>
						
							<a class="item " href="?lang=gl-ES">galego</a>
						
							<a class="item " href="?lang=uk-UA">українська</a>
						
							<a class="item " href="?lang=en-GB">English (United Kingdom)</a>
						
							<a class="item " href="?lang=hu-HU">Magyar</a>
						
							<a class="item " href="?lang=sk-SK">Slovenčina</a>
						
							<a class="item " href="?lang=id-ID">Indonesian</a>
						
							<a class="item " href="?lang=fa-IR">Persian</a>
						
							<a class="item " href="?lang=vi-VN">Vietnamese</a>
						
							<a class="item " href="?lang=pt-PT">Português</a>
						
					</div>
				</div>
			</div>
			<div class="ui center links item brand footertext">
				<span>Powered by:      <a href="https://github.com/gogits/gogs"><img class="ui mini footericon" src="/img/gogs.svg"/></a>         </span>
				<span>Hosted by:       <a href="http://neuro.bio.lmu.de"><img class="ui mini footericon" src="/img/lmu.png"/></a>          </span>
				<span>Funded by:       <a href="http://www.bmbf.de"><img class="ui mini footericon" src="/img/bmbf.png"/></a>         </span>
				<span>Registered with: <a href="http://doi.org/10.17616/R3SX9N"><img class="ui mini footericon" src="/img/re3data_logo.png"/></a>          </span>
				<span>Recommended by:  
					<a href="https://www.nature.com/sdata/policies/repositories#neurosci"><img class="ui mini footericon" src="/img/sdatarecbadge.jpg"/></a>
					<a href="https://fairsharing.org/recommendation/PLOS"><img class="ui mini footericon" src="/img/sm_plos-logo-sm.png"/></a>
					<a href="https://fairsharing.org/recommendation/eLifeRecommendedRepositoriesandStandards"><img class="ui mini footericon" src="/img/elife-logo-xs.fd623d00.svg"/></a>
				</span>
			</div>
		</div>
	</footer>
</body>







<script src="/js/libs/emojify-1.1.0.min.js"></script>
<script src="/js/libs/clipboard-1.5.9.min.js"></script>


</html>

* Connection #0 to host data1.praxis.kousu.ca left intact

And check the user's view (notice the TLS icon is there)

2021-07-07-134816_1527x620_scrot

  1. Disable the port 3000 firewall rule in https://arbutus.cloud.computecanada.ca/project/security_groups/

2021-07-07-135436_1668x205_scrot

  1. Figure out uploading via git.
    Gogs is running ssh on port 2222, which is..weird. But let's see if I can sort that out.

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly -p 2222 [email protected]
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
PTY allocation request failed on channel 0
Hi there, You've successfully authenticated, but GIN does not provide shell access.
Connection to data1.praxis.kousu.ca closed.
```

GREAT. Now can I make this permanent?

```
[kousu@requiem ~]$ vi ~/.ssh/config
[kousu@requiem ~]$ tail -n 6 ~/.ssh/config

Host match *.praxis.kousu.ca
User git
Port 2222
IdentityFile ~/.ssh/id_ed25519.neuropoly

[kousu@requiem ~]$ ssh data1.praxis.kousu.ca
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
PTY allocation request failed on channel 0
Hi there, You've successfully authenticated, but GIN does not provide shell access.
Connection to data1.praxis.kousu.ca closed.
```

Awesome. Okay, can I use *git* with this?

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Lets see if I can mirror our public dataset.

First, download it to my laptop (but not all of it, it's still pretty large; I ^C'd out of it):

[kousu@requiem datalad]$ git clone https://github.com/spine-generic/data-single-subject/
Cloning into 'data-single-subject'...
remote: Enumerating objects: 2360, done.
remote: Counting objects: 100% (2360/2360), done.
remote: Compressing objects: 100% (1209/1209), done.
remote: Total 2360 (delta 857), reused 2147 (delta 680), pack-reused 0
Receiving objects: 100% (2360/2360), 263.57 KiB | 1.68 MiB/s, done.
Resolving deltas: 100% (857/857), done.
[kousu@requiem datalad]$ cd data
bash: cd: data: No such file or directory
[kousu@requiem datalad]$ cd data-single-subject/
[kousu@requiem data-single-subject]$ git annex get .
(scanning for unlocked files...)
get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-douglas/anat/sub-douglas_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-douglas/anat/sub-douglas_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-douglas/anat/sub-douglas_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-glen/anat/sub-glen_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-glen/anat/sub-glen_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-glen/anat/sub-glen_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/dwi/sub-glen_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-mgh/anat/sub-mgh_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mgh/anat/sub-mgh_T2star.nii.gz (from amazon...) 
^C

Okay, now, make a repo on the new server: https://data1.praxis.kousu.ca/repo/create ->

Screenshot 2021-07-07 at 14-02-05 Gogs

Oh here's a bug; drat; I wonder if I can change the hostname gogs knows for itself, or if I need to rebuild it:

Screenshot 2021-07-07 at 14-02-34 jcohen data-single-subject

Screenshot 2021-07-07 at 14-03-07 jcohen data-single-subject

But if I swap in the right URL, and deal with git-annex being awkward, it works:

[kousu@requiem data-single-subject]$ git remote add praxis [email protected]:/jcohen/data-single-subject.git
[kousu@requiem data-single-subject]$ git push praxis master
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enumerating objects: 708, done.
Counting objects: 100% (708/708), done.
Delta compression using up to 4 threads
Compressing objects: 100% (407/407), done.
Writing objects: 100% (708/708), 142.68 KiB | 142.68 MiB/s, done.
Total 708 (delta 271), reused 708 (delta 271), pack-reused 0
remote: Resolving deltas: 100% (271/271), done.
To data1.praxis.kousu.ca:/jcohen/data-single-subject.git
 * [new branch]      master -> master
[kousu@requiem data-single-subject]$ git annex copy --to=praxis
(recording state in git...)
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
git-annex: cannot determine uuid for praxis (perhaps you need to run "git annex sync"?)
[kousu@requiem data-single-subject]$ git annex sync
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
commit 
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
ok
pull praxis 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (2/2), 138 bytes | 138.00 KiB/s, done.
From data1.praxis.kousu.ca:/jcohen/data-single-subject
 * [new branch]      git-annex  -> praxis/git-annex
ok
pull origin 
ok
(merging praxis/git-annex into git-annex...)
(recording state in git...)
push praxis 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enumerating objects: 1852, done.
Counting objects: 100% (1852/1852), done.
Delta compression using up to 4 threads
Compressing objects: 100% (760/760), done.
Writing objects: 100% (1851/1851), 126.85 KiB | 15.86 MiB/s, done.
Total 1851 (delta 768), reused 1518 (delta 586), pack-reused 0
remote: Resolving deltas: 100% (768/768), done.
To data1.praxis.kousu.ca:/jcohen/data-single-subject.git
 * [new branch]      git-annex -> synced/git-annex
 * [new branch]      master -> synced/master
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
ok
push origin 
Username for 'https://github.com': ^C
[kousu@requiem data-single-subject]$ git annex copy --to=praxis
copy derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
(to praxis...) 
ok                                
copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to praxis...) 
ok                                   
copy sub-chiba750/anat/sub-chiba750_T2star.nii.gz (to praxis...) 
ok                                 
copy sub-chiba750/anat/sub-chiba750_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (to praxis...) 
ok                                  
copy sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (to praxis...) 
ok                                
copy sub-douglas/anat/sub-douglas_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-douglas/anat/sub-douglas_T2star.nii.gz (to praxis...) 
ok                                
copy sub-douglas/anat/sub-douglas_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-douglas/dwi/sub-douglas_dwi.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_T2star.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/dwi/sub-glen_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (to praxis...) 
ok                                   
copy sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (to praxis...) 
ok                                 
copy sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (to praxis...) 
ok                                
copy sub-mgh/anat/sub-mgh_T1w.nii.gz (to praxis...) 
ok                                 
(recording state in git...)

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

the port 2222 problem

Using a non-standard ssh port is a problem. I know of five solutions:

export GIT_SSH_COMMAND="ssh -p 2222"

or

GIT_SSH_COMMAND="ssh -p 2222" git <subcommand> ....

Each user uses this each time they use the server.

  1. ssh_config

Each users adds this one-time to each new machine, at the same time as they provide their ssh key.

cat >> ~/.ssh/config <<EOF
Host match *.praxis.kousu.ca
Port 2222
EOF
  1. Use a separate IP address

ComputeCanada lets us allocate multiple IP addresses per machine. The inscription form asks if you want 1 or 2. If we had a second IP address, we could bind one of them to the underlying OS and the other to GIN.

Here's someone claiming to do this with Gitlab+docker: https://serverfault.com/a/951985

  1. Put the OS sshd on a different port

Just swap the two ports:

  • edit /etc/ssh/sshd_conf to set Port 2222
  • change the docker run line to docker run -p 3000:3000 -p 22:22 -d gnode/gin-web:live

Then the sysadmins need to know to use

ssh -p 2222 [email protected]

when they need to log in to fix something. That will hopefully be pretty rare, though. They could even do this:

cat >> ~/.ssh/config <<EOF
Host sysadmin-data1.praxis
HostName data1.praxis,kousu.ca
Port 2222
EOF

And users don't need to do anything special.

  1. Install GIN without docker

The docker image comes with a built in ssh server. If we install GIN on the base system and share the system ssh there won't be a second port to worry about.

This is more work because it requires rebuilding their package in a non-docker way. It's my preference though. I would like to build a .deb so you can "apt-get install gin" and have everything Just Work.

We could also make this package deploy dehydrated and nginx as above, to save even more time for the users.

  1. Gogs has a semi-official workaround at http://www.ateijelo.com/blog/2016/07/09/share-port-22-between-docker-gogs-ssh-and-local-system

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Demo day went well by the way. https://praxisinstitute.org/ seems happy to investigate this direction.

@kousu
Copy link
Member Author

kousu commented Jul 7, 2021

Shortcuts taken that should be corrected:

  • I didn't deploy the dehydrated cronjobs to renew certs
  • I didn't deploy unattended-upgrades to keep the system up to date
  • I haven't tested it docker will survive a reboot
  • I told it the wrong hostname (it knows itself as localhost and by its IP address, not by data1.praxis.kousu.ca)
  • It has no email server, so it can't send password resets

All of these could be fixed quick by bringing this server under ansible, but I wrote into the ansible scripts the assumption that all servers are under *.neuro.polymtl.ca, so I'd need to fix that first.

Also

@taowa
Copy link
Contributor

taowa commented Nov 20, 2021

I've been working on adding this to the lab's configuration management today (for those who have access, that's at https://github.com/neuropoly/computers/pull/227). To that end, I'm re-purposing the resources allocated to praxis-gin to be for vaughan-test.neuropoly.org, which will be our dev server for data.neuropoly.org.

@kousu
Copy link
Member Author

kousu commented Nov 20, 2021

And https://data.neuropoly.org will be just as good a demo for Praxis the next time we talk with them. And with the ansible work you're doing it will even be reproducible for them to build their own https://data.praxisinstitute.org :)

@kousu
Copy link
Member Author

kousu commented Dec 17, 2021

Some competitors:

Possible collaborators:

Portals (where we could potentially get ourselves listed, especially if we help them out by making sure we have APIs available):

@kousu
Copy link
Member Author

kousu commented Jan 9, 2022

Part of our promise of standards-compliant security was to run fail2ban but maybe pam_faillock is an easier choice (https://github.com/neuropoly/computers/issues/168#issuecomment-1008239662). But perhaps not.

@kousu
Copy link
Member Author

kousu commented Jan 31, 2022

We had a meeting with Praxis today:

  • The research centers in Vancouver, Toronto and Calgary have gotten ethics approval to upload to the server we're making
  • The other 4 research centers are bogged down in ethics
  • For a minimum-viable product, we'll have a single server and we'll use Gitea Organizations to do inter/intra-research center access control
  • We're aiming to have something in production by March.

Some other related work that came up:

  • FHIR is a spec out of europe for medical data; it's comparable to git-annex + BIDS as an ensemble.

@kousu
Copy link
Member Author

kousu commented Jan 31, 2022

@taowa is going to contact ComputeCanada asking them to extend our allocation on https://docs.computecanada.ca/wiki/Cloud_resources#Arbutus_cloud_.28arbutus.cloud.computecanada.ca.29 from 2 VPSes to 3 -- one for data-test.neuropoly.org, one for data.neuropoly.org, and one for data.praxisinstitute.org.

@kousu
Copy link
Member Author

kousu commented Apr 28, 2022

We've done a lot of work in our private repo at https://github.com/neuropoly/computers/issues/167 (the praxis-specific part is at https://github.com/neuropoly/computers/pull/332) on this. We've got an ansible deployment, a fork of gitea (neuropoly/gitea#1); and have a demo server at https://data.praxisinstitute.org.dev.neuropoly.org/. Eventually we will want to extract those ansible scripts and publish them on Galaxy

Today we talked to Praxis and David Cadotte again and got an update on how their data negotiations are going:

  • Ottawa (n = 11); curator: ?; Data Sharing Agreement* Signed; data BIDS-ified
    • in a week or two will have a dataset we can trial uploading; meaning we need to have data.praxisinstitute.org live and in production by then
  • Halifax (n = 20); curator: ?; ; data BIDS-ified?
  • Calgary (n = 9); curator: ?; data BIDS-ified?
  • Montreal (n = 68); curator: ?
  • Vancouver: waiting on ethics approval
  • Hamilton (n > 150): waiting on ethics approval
  • Quebec City: not yet started

Each site is very different and needs help adapting to their environment; they have different PACS, different OSes, different levels of familiarity with the command line. David has been spending time giving tech support to some of the sites' curators to help get their data in BIDS format. We have created a wiki here to gather the information David has been teaching and anything we learn during our trial dataset uploads; it's here on GitHub but could migrated to https://data.praxisinstitute.org, once that's live (and eventually perhaps these docs could even be rolled into the ansible deployment, as a standard part of Neurogitea?).

We will be in touch with Praxis's IT team in the next couple weeks so we can migrate https://data.praxisinstitute.org.dev.neuropoly.org -> https://data.praxisinstitute.org.

@mguaypaq
Copy link
Member

mguaypaq commented May 6, 2022

We got some branding feedback from Praxis Institute for the soon-to-be https://data.praxisinstitute.org:

We have received some feedback from our director of marketing, and she really liked the website header colors (no need to change the text). She did provide couple of suggestions:

  1. The logo resolution is appearing quite low on the website header (compared to the tagline text), could you please adjust it to look more as in the svg logo file?
  2. Is it possible to use a word mark for Neurogitea? Something similar to the one attached would be great!
  3. For paragraph text, will it be possible to have a white or very light background with dark text colour (black, dark grey, etc.)?

(Note that the current demo simply uses the arc-green theme which comes bundled with Gitea, as described in this section of the Gitea customization docs.)

  • For point 1, it was pixelated because we had been using an earlier PNG version of the logo, but we have a nice SVG version now. I'm attaching two slightly modified versions of the logo here:
    • praxis.svg is a version with black text, suitable for use on a light background (for example, the big main logo)
    • praxis-rev.svg is a version with white text, suitable for use on a dark background (for example, in the page header if that stays dark).
  • For point 2, I managed to get an SVG version of the wordmark (that is, the word "Neurogitea" but in a fancy font) which doesn't depend on specific fonts being installed on the viewer's computer: wordmark.svg
  • For point 3, we should be able to tweak arc-green (source link) into a new theme for Gitea to use. I'll note some colours used by the Praxis Institute website:
    • dark header background: #161616
    • light body background: #fefefe
    • wordmark text: #000000
    • paragraph text: #646464
    • blue logo: #00bed6

@kousu
Copy link
Member Author

kousu commented May 6, 2022

On my end, I emailed R. Foley at Praxis to ask to get DNS reassigned to our existing ComputeCanada instances, so that we will have https://data.praxisinstitute.org in place of https://data.praxisinstitute.org.dev.neuropoly.org.

EDIT: R. Foley got back to say that they don't want to give us praxisinstitute.org, but will talk to their marketing team and decide on an appropriate domain we can have.

@kousu
Copy link
Member Author

kousu commented May 7, 2022

On the demo server I just saw this probe from Mongolia:

180.149.125.168 - - [07/May/2022:19:21:42 -0400] "GET /c/ HTTP/1.1" 307 180 "-" "Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"

so it occurs to me that maybe we should impose geoblocking on Praxis's server. It's meant to be a pan-Canada project, so maybe we should impose firewall rules that actually enforce that it's only pan-Canadian. That's a bit tricky to do; I guess we can extract ip blocks from MaxMind's geoip-database and feed them into iptables?

@kousu
Copy link
Member Author

kousu commented May 24, 2022

* [ ]  For point 3, we should be able to tweak `arc-green` ([source link](https://github.com/go-gitea/gitea/blob/main/web_src/less/themes/theme-arc-green.less)) into a new theme for Gitea to use. I'll note some colours used by [the Praxis Institute website](https://praxisinstitute.org/):
  
  * dark header background: #161616
  * light body background: #fefefe
  * wordmark text: #000000
  * paragraph text: #646464
  * blue logo: #00bed6

theme-praxis.css:

:root {
  --color-body: #fefefe;
  --color-text: #646464;
  --color-primary: #00bed6; /* blue of the logo */
  --color-primary-dark-1: #00a6bb;
  --color-secondary: #b4008d; /* purple used to give some visual contrast */
  --color-secondary-dark-1: #8a016c;
  --color-warning-bg: #ffb600; /* yellow used as a tertiary colour */
  --color-warning-bg-dark-1: #D99C03;

  --color-menu: var(--color-body);
}

.following.bar.light {
  /* nav bar is dark themed, in contrast to the rest of the site */
  --color-body: #161616;
  --color-text: #bbc0ca;
}

.following.bar.light .dropdown .menu {
  /* but dropdowns within the navbar are back to light themed */
  --color-body: #fefefe;
  --color-text: #646464;
}

.ui.basic.green.buttons .button:hover, .ui.basic.green.button {
  color: var(--color-primary);
}

.ui.basic.green.buttons .button:hover, .ui.basic.green.button:hover {
  color: var(--color-primary-dark-1);
}

.ui.green.buttons .button, .ui.green.button {
  background-color: var(--color-primary);
}

.ui.green.buttons .button, .ui.green.button:hover {
  background-color: var(--color-primary-dark-1);
}

.ui.red.buttons .button, .ui.red.button {
  background-color: var(--color-warning-bg);
}

.ui.red.buttons .button, .ui.red.button:hover {
  background-color: var(--color-warning-bg-dark-1);
}

Looks like:

Screenshot 2022-05-24 at 12-04-28 Gitea Git with a cup of tea
Screenshot 2022-05-24 at 12-03-44 Gitea Git with a cup of tea
Screenshot 2022-05-24 at 12-03-14 Gitea Git with a cup of tea

I went one step slightly further than the default themes and themed the yes/no buttons as blue/yellow (matching colours I got off https://praxisinstitute.org/) instead of the default green/red.

I'll integrate this into ansible this afternoon.

After that I'll replace the logos.

@kousu
Copy link
Member Author

kousu commented May 25, 2022

* [ ]  For point 1, it was pixelated because we had been using an earlier PNG version of the logo, but we have a nice SVG version now. I'm attaching two slightly modified versions of the logo here:
  
  * [praxis.svg](https://user-images.githubusercontent.com/928742/167062346-6c35192c-fd87-4052-8456-b005fd8dbab1.svg) is a version with black text, suitable for use on a light background (for example, the big main logo)

I followed the instructions:

make generate-images
git clone https://github.com/go-gitea/gitea
cd gitea
cp praxis.svg assets/logo.svg
make generate-images
# wait a while
mkdir -p mkdir custom/public/img
cp public/img/{apple-touch-icon.png,avatar_default.png,{favicon,logo}.{png,svg}} custom/public/img # or somewhere else to stage it

But it came out badly:

Screenshot 2022-05-24 at 22-06-10 Gitea Git with a cup of tea

After mulling for a while, I opened it up in Inkscape and saw the problem: the viewport is bizarrely too large. It's set to

viewBox="0 0 1520 1230"

With Inkscape helping me measure things, I worked out that the tight viewport is

viewBox="300 297 935 663"

So here's that file: praxis.svg

you can see the difference here

Screenshot_20220524_225602
Screenshot_20220524_225612

With this, it's a lot better:

Screenshot 2022-05-24 at 23-00-06 Gitea Git with a cup of tea

  * [praxis-rev.svg](https://user-images.githubusercontent.com/928742/167062353-93ce125e-5aac-48b7-80f7-e30ac9494946.svg) is a version with white text, suitable for use on a dark background (for example, in the page header if that stays dark).

Thanks, but I think I'm going to end up skipping praxis-rev.svg. For one thing, I'd prefer to control its colours via CSS in the theme file (which is a thing you can do with svgs) for another, the logo is really way too small with the text attached for the navbar, so I'm just going to cut it off and leave the blue butterfly-spine, which isn't light/dark sensitive.

And here's that file: logo.svg

With this, the navbar looks better:

Screenshot 2022-05-24 at 23-13-29 Gitea Git with a cup of tea

but now the cover page is missing the title, because the cover logo and the navbar logo are the same file, so I need to separate the two and customize the cover page to know that. I have to anyway to handle the wordmark part.

I put

cp praxis.svg custom/public/img/logo-home.svg
$ find . -name home.tmpl 
./templates/org/home.tmpl
./templates/home.tmpl
./templates/repo/home.tmpl
$ mkdir -p custom/templates
$ cp templates/home.tmpl custom/templates/
$ vi custom/templates/home.tmpl 

And made this:

{{template "base/head" .}}
<div class="page-content home">
	<div class="ui stackable middle very relaxed page grid">
		<div class="sixteen wide center aligned centered column">
			<div>
				<img class="logo" width="220" height="220" src="{{AssetUrlPrefix}}/img/logo-home.svg"/>
			</div>
			<div class="hero">
				<h1 class="ui icon header title">
					{{AppName}}
				</h1>
				<h2>{{.i18n.Tr "startpage.app_desc"}}</h2>
			</div>
		</div>
	</div>
</div>
{{template "base/footer" .}}
diff
--- templates/home.tmpl 2021-12-17 11:13:14.466947322 -0500
+++ custom/templates/home.tmpl  2022-05-24 23:20:53.533449377 -0400
@@ -3,7 +3,7 @@
        <div class="ui stackable middle very relaxed page grid">
                <div class="sixteen wide center aligned centered column">
                        <div>
-                               <img class="logo" width="220" height="220" src="{{AssetUrlPrefix}}/img/logo.svg"/>
+                               <img class="logo" width="220" height="220" src="{{AssetUrlPrefix}}/img/logo-home.svg"/>
                        </div>
                        <div class="hero">
                                <h1 class="ui icon header title">
@@ -13,41 +13,5 @@
                        </div>
                </div>
        </div>
-       <div class="ui stackable middle very relaxed page grid">
-               <div class="eight wide center column">
-                       <h1 class="hero ui icon header">
-                               {{svg "octicon-flame"}} {{.i18n.Tr "startpage.install"}}
-                       </h1>
-                       <p class="large">
-                               {{.i18n.Tr "startpage.install_desc" | Str2html}}
-                       </p>
-               </div>
-               <div class="eight wide center column">
-                       <h1 class="hero ui icon header">
-                               {{svg "octicon-device-desktop"}} {{.i18n.Tr "startpage.platform"}}
-                       </h1>
-                       <p class="large">
-                               {{.i18n.Tr "startpage.platform_desc" | Str2html}}
-                       </p>
-               </div>
-       </div>
-       <div class="ui stackable middle very relaxed page grid">
-               <div class="eight wide center column">
-                       <h1 class="hero ui icon header">
-                               {{svg "octicon-rocket"}} {{.i18n.Tr "startpage.lightweight"}}
-                       </h1>
-                       <p class="large">
-                               {{.i18n.Tr "startpage.lightweight_desc" | Str2html}}
-                       </p>
-               </div>
-               <div class="eight wide center column">
-                       <h1 class="hero ui icon header">
-                               {{svg "octicon-code"}} {{.i18n.Tr "startpage.license"}}
-                       </h1>
-                       <p class="large">
-                               {{.i18n.Tr "startpage.license_desc" | Str2html}}
-                       </p>
-               </div>
-       </div>
 </div>
 {{template "base/footer" .}}

And now I've got

Screenshot 2022-05-24 at 23-25-25 Gitea Git with a cup of tea

Which seems to be coming along nicely.

And finally I regenerated the images

cp logo.svg assets/logo.svg
make generate-images
cp public/img/{apple-touch-icon.png,avatar_default.png,{favicon,logo}.{png,svg}} custom/public/img # or somewhere else to stage it

EDIT: turns out, as of yesterday, now there's an extra step:

cp logo.svg assets/logo.svg 
cp assets/logo.svg assets/favicon.svg # see https://github.com/go-gitea/gitea/pull/18542
make generate-images
cp public/img/{apple-touch-icon.png,avatar_default.png,{favicon,logo}.{png,svg}} custom/public/img # or somewhere else to stage it

@kousu
Copy link
Member Author

kousu commented May 25, 2022

* [ ]  For point 2, I managed to get an SVG version of the wordmark (that is, the word "Neurogitea" but in a fancy font) which doesn't depend on specific fonts being installed on the viewer's computer: [wordmark.svg](https://user-images.githubusercontent.com/928742/167062360-18c799d9-4da6-4911-bfa6-a064f7e115a3.svg)

For this, I put

cp wordmark.svg custom/public/img/neurogitea-wordmark.svg

And did this patch to what I had above:

diff --git a/custom/public/css/theme-praxis.css b/custom/public/css/theme-praxis.css
index a1665744f..d8cf3faea 100644
--- a/custom/public/css/theme-praxis.css
+++ b/custom/public/css/theme-praxis.css
@@ -47,3 +47,9 @@
 .ui.red.buttons .button, .ui.red.button:hover {
   background-color: var(--color-warning-bg-dark-1);
 }
+
+/* the neurogitea wordmark needs some CSS resets to display properly */
+.ui.header > img.logo {
+  max-width: none;
+  width: 500px;
+}
diff --git a/custom/templates/home.tmpl b/custom/templates/home.tmpl
index d7d1d8501..2aaaa176b 100644
--- a/custom/templates/home.tmpl
+++ b/custom/templates/home.tmpl
@@ -7,7 +7,7 @@
                        </div>
                        <div class="hero">
                                <h1 class="ui icon header title">
-                                       {{AppName}}
+                                       <img class="logo" src="{{AssetUrlPrefix}}/img/neurogitea-wordmark.svg"/>
                                </h1>
                                <h2>{{.i18n.Tr "startpage.app_desc"}}</h2>
                        </div>

And now I've got

Screenshot 2022-05-24 at 23-37-11 Gitea Git with a cup of tea

@kousu
Copy link
Member Author

kousu commented May 25, 2022

Some other praxis-specific things to include:

# app.ini
APP_NAME = Neurogitea

[ui]
THEMES = praxis
DEFAULT_THEME = praxis

[ui.meta]
AUTHOR = "Praxis Spinal Cord Institute"
DESCRIPTION = "Neurogitea connects spinal cord researchers with each others' data"
KEYWORDS = "bids,data sharing,git-annex,datalad,git-lfs,reproducible science" # ?

@kousu
Copy link
Member Author

kousu commented May 25, 2022

Theming is sitting on https://github.com/neuropoly/computers/pull/332/commits/b365da08c69c67509bbcdcbffe3348cda521cfd0 (sorry it's in the private repo; extracting and publishing to Ansible Galaxy will be a Real-Soon-Now goal)

@kousu
Copy link
Member Author

kousu commented May 25, 2022

On my end, I emailed R. Foley at Praxis to ask to get DNS reassigned to our existing ComputeCanada instances, so that we will have https://data.praxisinstitute.org in place of https://data.praxisinstitute.org.dev.neuropoly.org.

EDIT: R. Foley got back to say that they don't want to give us praxisinstitute.org, but will talk to their marketing team and decide on an appropriate domain we can have.

They've made a decision: spineimage.ca. I've asked them to assign

      spineimage.ca   206.12.97.250
drone.spineimage.ca   206.12.93.20

When that's done, I'll add those domains in https://github.com/neuropoly/computers/pull/332; and then we should maybe think about pointing data.praxisinstitute.org.dev.neuropoly.org back at some servers on Amazon again to host a staging server we can use without knocking out their prod server.

@kousu
Copy link
Member Author

kousu commented Jun 1, 2022

Meeting - site_03

We had a meeting today with Praxis, including the first trial data curator.

David Cadotte had helped her already curate the dataset into BIDS. We successfully uploaded it to https://spineimage.ca/TOH/site_03.

User Documentation

David Cadotte has a draft curator tutorial 🔏 . I started the same document on the wiki here but his is further along.

The next step is that David, me and the trial curator are going to upload a trial dataset to https://data.praxisinstitute.org.dev.neuropoly.org/ together. We will be picking a meeting time via Doodle soon.

The curator has been using bids-validator, but it sounded like they were using the python version, not the javascript one. The javascript one is incomplete but the python version is even more incomplete. This is something I should check on when we get together to upload the dataset.

Prod Plan: https://spinalimage.ca

In parallel, we will finish up migrating to spineimage.ca, the "prod" site, and sometime next month we should have 4 - 5 curators ready.

Future dev plan: https://spineimage.ca.dev.neuropoly.org

We'll have to repurpose the existing VMs to become prod. But I would like to keep the staging site so we can have something to experiment on. I could experiment locally but I don't have an easy way to turn off or mock https, so it's simpler just to have a mock server with a real cert from LetsEncrypt. I'll rename it spinalimage.ca.dev.neuropoly.org

But there's a problem: ComputeCanada gave has given us three VMs but only two public IPs; the current version of neurogitea needs two IPs.

Some ideas:

  • ask ComputeCanada for another VM and two more IPs
    • con: they are probably getting tired of us asking for more resources
  • move the staging server to DigitalOcean or Amazon
    • con: if we size the instances to be even somewhat realistic this is very expensive (~100$/mth)
  • pay for a small VM on DigitalOcean or Amazon, set up a reverse tunnel from the third VM on ComputeCanada using autossh (or maybe even wireguard?)
    • con: added sysadmin complexity
  • merge https://github.com/neuropoly/computers/pull/321
    • con: this goes against Drone's advice to not run everything on a single server; though it works, I've tested it.
  • finish @mguaypaq's work on replacing Drone with an inline bids-validator-only gitea plugin

@kousu kousu changed the title Praxis Data Servers Praxis Data Server (https://spineimage.ca) Jun 9, 2022
@kousu kousu changed the title Praxis Data Server (https://spineimage.ca) Praxis Data Server (https://spineimage.ca) Jun 9, 2022
@mguaypaq
Copy link
Member

mguaypaq commented Jul 12, 2022

Summary for today: we were able to connect with Lisa J. from the site_012, and had some pretty good success.

  • We managed to install the required dependencies for curation on Windows, and updated the shared Google doc with the steps taken.
  • We got dcm2bids to run (including dcm2niix), but we suspect the configuration file isn't complete yet. We'll need to read some documentation before we can guide people along the further steps.

@mguaypaq
Copy link
Member

mguaypaq commented Jul 12, 2022

Also, we noticed that for site_03, the participants.json file contains row data, which should only be in participants.tsv, so we should open an issue with Maryam to fix it, and make the curation documentation clearer on that point.

@kousu
Copy link
Member Author

kousu commented Jul 12, 2022

Meeting - site_012

Lisa didn't yet have her dataset curated. We got halfway through curation, and did not start at all on uploading. Also like @mguaypaq said, we were figuring out Windows remotely as we went, never having used much of this software stack ourselves there.

We didn't have Administrator on the computer she was working on. The git installer was able to handle this by itself, but we had to tweak the installer settings for both git-annex and python to make sure to install to C:\Users\%USER\AppData\Local\Programs and not to try to install anything system-wide.

dcm2niix continues to be tricky because it doesn't have an installer. It just has zip files of binaries. We put it in C:\Users%USER%\bin, because git-bash has that on its $PATH, but it's unclear if that's a good long-term recommendation. It's in apt and brew, and there's a conda package that could be used on Windows, if we were to get people to install conda first.

By the way, we skipped using pycharm or a virtualenv, and that worked fine. Our curators are not often developers, so explaining virtualenvs is a whole extra can of worms that derails the training. venv only helps when you're developing many python projects and those projects have incompatible dependencies -- regular end users should just be able to pip install anything and mostly have things work (and if they don't, it should be a bug on the shoulders of the developers of those softwares).

@kousu
Copy link
Member Author

kousu commented Aug 3, 2022

Backups

Put backups on https://docs.computecanada.ca/wiki/Arbutus_Object_Storage. Even if backups are encrypted, the data sharing agreement says backups need to stay within the cluster.

@kousu
Copy link
Member Author

kousu commented Sep 16, 2022

Meeting - site_012

Today we had a meeting for 2 hours.

  1. We gave @jcohenadad an account on https://spineimage.ca/, and made sure David Cadotte and our site_012 curator remembered their passwords too.

  2. David Cadotte helped the site 12 curator use dcm2bids_helper and dcm2bids to construct and test a dcm2bids config file. It is a tedious process, doing:

    1. dcm2bids_helper -d sourcedata/$subject_id --force
    2. Examine the contents of all tmp_dcm2bids/helper/*.json files
    3. Create an entry in code/dcm2bids_config.json that matches (SeriesDescription, ProtocolName) from those JSON files with (dataType, modalityLabel, customLabels).
    4. Running dcm2bids -c code/dcm2bids_config.json -d sourcedata/$subject_id -p $sequential_id

    David estimated it takes half an hour per subject, even once the curator is fluent with it.

    (note that the mapping between $subject_id and $sequential_id is secret, and maintained by individual curators and Praxis)

  3. We decided to update the curation protocol by adding a site prefix to the subject IDs, e.g. hal019 for subject 19 from Halifax, ott002 for subject 2 from Ottawa.

@kousu
Copy link
Member Author

kousu commented Sep 23, 2022

@mguaypaq has helped me merge in HTTP downloads, which means we now can host open access datasets on any server we choose, in any country we can find one.

I've tested it by moving a complete copy of https://github.com/spine-generic/data-single-subject onto https://data.dev.neuropoly.org/: https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/.

Uploading

First, download the current copy:

p115628@joplin:~/datasets$ git clone https://github.com/spine-generic/data-single-subject
Clonage dans 'data-single-subject'...
remote: Enumerating objects: 2378, done.
remote: Counting objects: 100% (545/545), done.
remote: Compressing objects: 100% (344/344), done.
remote: Total 2378 (delta 55), reused 384 (delta 49), pack-reused 1833
Réception d'objets: 100% (2378/2378), 299.46 Kio | 1.42 Mio/s, fait.
Résolution des deltas: 100% (578/578), fait.
p115628@joplin:~/datasets$ cd data-single-subject/
p115628@joplin:~/datasets/data-single-subject$ git clone^C
p115628@joplin:~/datasets/data-single-subject$ time git annex get
(merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                   
get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-douglas/anat/sub-douglas_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-douglas/anat/sub-douglas_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-douglas/anat/sub-douglas_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-glen/anat/sub-glen_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-glen/anat/sub-glen_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/anat/sub-glen_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-glen/dwi/sub-glen_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-mgh/anat/sub-mgh_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mgh/anat/sub-mgh_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mgh/anat/sub-mgh_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-mgh/dwi/sub-mgh_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-perform/anat/sub-perform_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-perform/anat/sub-perform_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-perform/anat/sub-perform_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-perform/dwi/sub-perform_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-poly/anat/sub-poly_T1w.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-poly/anat/sub-poly_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-poly/anat/sub-poly_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-poly/dwi/sub-poly_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (from amazon...) 

(checksum...) ok                      
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-ucl/anat/sub-ucl_T1w.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-ucl/anat/sub-ucl_T2star.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-ucl/anat/sub-ucl_T2w.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                  
get sub-ucl/dwi/sub-ucl_dwi.nii.gz (from amazon...) 

(checksum...) ok                   
get sub-unf/anat/sub-unf_T1w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-unf/anat/sub-unf_T2star.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-unf/anat/sub-unf_T2w.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (from amazon...) 

(checksum...) ok                    
get sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (from amazon...) 

(checksum...) ok                     
get sub-unf/dwi/sub-unf_dwi.nii.gz (from amazon...) 

(checksum...) ok                     
(recording state in git...)

real    1m10,223s
user    0m52,723s
sys     0m5,956s

Then upload:

p115628@joplin:~/datasets/data-single-subject$ git remote add gg [email protected]:nick.guenther/spine-generic-single.git
p115628@joplin:~/datasets/data-single-subject$ git push -u gg master
Énumération des objets: 720, fait.
Décompte des objets: 100% (720/720), fait.
Compression par delta en utilisant jusqu'à 128 fils d'exécution
Compression des objets: 100% (418/418), fait.
Écriture des objets: 100% (720/720), 153.85 Kio | 153.85 Mio/s, fait.
Total 720 (delta 272), réutilisés 719 (delta 272), réutilisés du pack 0
remote: Résolution des deltas: 100% (272/272), fait.
remote: . Processing 1 references
remote: Processed 1 references in total
To data.dev.neuropoly.org:nick.guenther/spine-generic-single.git
 * [new branch]      master -> master
La branche 'master' est paramétrée pour suivre la branche distante 'master' depuis 'gg'.
p115628@joplin:~/datasets/data-single-subject$ git annex sync gg
commit 
Sur la branche master
Votre branche est à jour avec 'gg/master'.

rien à valider, la copie de travail est propre
ok
pull gg 
remote: Énumération des objets: 2, fait.
remote: Décompte des objets: 100% (2/2), fait.
remote: Total 2 (delta 0), réutilisés 0 (delta 0), réutilisés du pack 0
Dépaquetage des objets: 100% (2/2), 137 octets | 137.00 Kio/s, fait.
Depuis data.dev.neuropoly.org:nick.guenther/spine-generic-single
 * [nouvelle branche] git-annex  -> gg/git-annex
ok
(merging gg/git-annex into git-annex...)
(recording state in git...)
push gg 
Énumération des objets: 2093, fait.
Décompte des objets: 100% (2093/2093), fait.
Compression par delta en utilisant jusqu'à 128 fils d'exécution
Compression des objets: 100% (1205/1205), fait.
Écriture des objets: 100% (2092/2092), 135.47 Kio | 6.16 Mio/s, fait.
Total 2092 (delta 988), réutilisés 1085 (delta 305), réutilisés du pack 0
remote: Résolution des deltas: 100% (988/988), fait.
remote: 
remote: Create a new pull request for 'synced/git-annex':
remote:   https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/compare/master...synced/git-annex
remote: 
remote: 
remote: Create a new pull request for 'synced/master':
remote:   https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/compare/master...synced/master
remote: 
remote: .. Processing 2 references
remote: Processed 2 references in total
To data.dev.neuropoly.org:nick.guenther/spine-generic-single.git
 * [new branch]      git-annex -> synced/git-annex
 * [new branch]      master -> synced/master
ok
p115628@joplin:~/datasets/data-single-subject$ time git annex copy --to gg
copy derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) 
ok                                
copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to gg...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to gg...) 
ok                                 
copy sub-chiba750/anat/sub-chiba750_T2star.nii.gz (to gg...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_T2w.nii.gz (to gg...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (to gg...) 
ok                                
copy sub-douglas/anat/sub-douglas_T1w.nii.gz (to gg...) 
ok                                
copy sub-douglas/anat/sub-douglas_T2star.nii.gz (to gg...) 
ok                                
copy sub-douglas/anat/sub-douglas_T2w.nii.gz (to gg...) 
ok                                
copy sub-douglas/dwi/sub-douglas_dwi.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_T1w.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_T2star.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_T2w.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-glen/dwi/sub-glen_dwi.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (to gg...) 
ok                                 
copy sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_T1w.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_T2star.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_T2w.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (to gg...) 
ok                                
copy sub-mgh/dwi/sub-mgh_dwi.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (to gg...) 
ok                                
copy sub-perform/anat/sub-perform_T1w.nii.gz (to gg...) 
ok                                 
copy sub-perform/anat/sub-perform_T2star.nii.gz (to gg...) 
ok                                
copy sub-perform/anat/sub-perform_T2w.nii.gz (to gg...) 
ok                                
copy sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-perform/dwi/sub-perform_dwi.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_T1w.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_T2star.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_T2w.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-poly/dwi/sub-poly_dwi.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (to gg...) 
ok                                 
copy sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_T1w.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_T2star.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_T2w.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-ucl/dwi/sub-ucl_dwi.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_T1w.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_T2star.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_T2w.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (to gg...) 
ok                                
copy sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (to gg...) 
ok                                
copy sub-unf/dwi/sub-unf_dwi.nii.gz (to gg...) 
ok                                
(recording state in git...)

real    0m24,742s
user    0m2,539s
sys     0m3,207s

This DigitalOcean server is crazy fast by the way: that's 293.66Mb/s upload. Downloading from Amazon was still pretty fast but it was "only" 100Mb/s. And the DigitalOcean server is in Toronto but the Amazon servers are (supposed to be) in Montreal.


I went to its settings to make it Public:

Screenshot 2022-09-22 at 21-19-17 spine-generic-single

then I downloaded anonymously, with the same commands we tell people to currently use against the GitHub copy:

Download
p115628@joplin:~/datasets$ time git clone https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git data-single-subject-http2
Clonage dans 'data-single-subject-http2'...
remote: Enumerating objects: 3243, done.
remote: Counting objects: 100% (3243/3243), done.
remote: Compressing objects: 100% (1228/1228), done.
remote: Total 3243 (delta 1614), reused 2595 (delta 1260), pack-reused 0
Réception d'objets: 100% (3243/3243), 311.17 Kio | 8.19 Mio/s, fait.
Résolution des deltas: 100% (1614/1614), fait.

real    0m0,317s
user    0m0,171s
sys     0m0,104s
p115628@joplin:~/datasets$ cd data-single-subject-http2
p115628@joplin:~/datasets/data-single-subject-http2$ time git annex get
(merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) 
(checksum...) ok
get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from origin...) 
(checksum...) ok
get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from origin...) 
(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-douglas/anat/sub-douglas_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-douglas/anat/sub-douglas_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-douglas/anat/sub-douglas_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from origin...) 
(checksum...) ok                  
get sub-glen/anat/sub-glen_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-glen/anat/sub-glen_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-glen/anat/sub-glen_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-glen/dwi/sub-glen_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/anat/sub-mgh_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-mgh/anat/sub-mgh_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/anat/sub-mgh_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-mgh/dwi/sub-mgh_dwi.nii.gz (from origin...) 
(checksum...) ok                  
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-perform/anat/sub-perform_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-perform/anat/sub-perform_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-perform/anat/sub-perform_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-perform/dwi/sub-perform_dwi.nii.gz (from origin...) 
(checksum...) ok                  
get sub-poly/anat/sub-poly_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-poly/anat/sub-poly_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-poly/anat/sub-poly_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-poly/dwi/sub-poly_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/anat/sub-ucl_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-ucl/anat/sub-ucl_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/anat/sub-ucl_T2w.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-ucl/dwi/sub-ucl_dwi.nii.gz (from origin...) 
(checksum...) ok
get sub-unf/anat/sub-unf_T1w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-unf/anat/sub-unf_T2star.nii.gz (from origin...) 
(checksum...) ok
get sub-unf/anat/sub-unf_T2w.nii.gz (from origin...) 
(checksum...) ok                  
get sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (from origin...) 
(checksum...) ok
get sub-unf/dwi/sub-unf_dwi.nii.gz (from origin...) 
(checksum...) ok
(recording state in git...)

real    1m13,143s
user    1m4,191s
sys     0m5,872s
p115628@joplin:~/datasets/data-single-subject-http2$ git remote -v
amazon
origin  https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git (fetch)
origin  https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git (push)

Well, maybe I take back what I said: DigitalOcean's download was almost identical to Amazon's. Funny that the uplink was faster, usually it's the other way around.

Also to note: because Push-to-Create is turned on in our gitea config, there was very little fumbling around. A single git annex sync --content should be enough to upload everything, no messing with the web UI.

To emphasize again: we now have a fully alternate copy of https://github.com/spine-generic/data-single-subject. Identical download instructions work for it; all someone has to do is swap in https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/. 🎉 (fair warning though: this is a dev server). And with this arrangement, all bandwidth -- git and git-annex -- is paid for through DigitalOcean, instead of splitting the bill between GitHub and Amazon. And when we do promote this to a production server, we can get rid of the difficult contributor AWS credentials.

Unfortunately I've already found one bug that we missed in neuropoly/gitea#19 but it's minor.

@kousu
Copy link
Member Author

kousu commented Dec 1, 2022

Backups

Put backups on https://docs.computecanada.ca/wiki/Arbutus_Object_Storage. Even if backups are encrypted, the data sharing agreement says backups need to stay within the cluster.

I'm working on this now. That wiki page is helpful but I still have to fill in some details, which I am writing down here:

  1. Get dependencies.

    You need openstack. I'm on Arch but this should be portable to Ubuntu:

    sudo pacman -S python-openstackclient
    
  2. Go to https://arbutus.cloud.computecanada.ca/auth/login/?next=/project/ and log in

  3. Download the OpenStack RC file from it.

    Screenshot_20221130_200005

  4. Load the OpenStack RC file.

    $ . ~/def-jcohen-dev-openrc.sh 
    Please enter your OpenStack Password for project def-jcohen-dev as user nguenthe: [ TYPE ARBUTUS PASSWORD HERE ]
    
  5. Create an S3 token.

    Tokens are something I can give out to the backup bot without compromising my complete account.

    openstack ec2 credentials create

    I could probably use openstack ec2 credentials create -f json to script the rest of the process from here, but since this is a rare mostly-one-time operation I'm not going to bother.

    $ openstack ec2 credentials create  # these are not the real credentials, I revoked these after pasting them here
    +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Field      | Value                                                                                                                                                                                 |
    +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | access     | 5f192d6ec2124fd3b23ad76036b31107                                                                                                                                                      |
    | links      | {'self': 'https://arbutus.cloud.computecanada.ca:5000/v3/users/885e1521925cb445789bb33f9c0e035ee6e1256d01fbb7257301ec965f86a966/credentials/OS-EC2/5f192d6ec2124fd3b23ad76036b31107'} |
    | project_id | 455810c28e2e4b36a223e6cd2e6abcdc                                                                                                                                                      |
    | secret     | f7828fa44b0a4fa186df7bb6608ff975                                                                                                                                                      |
    | trust_id   | None                                                                                                                                                                                  |
    | user_id    | 885e1521925cb445789bb33f9c0e035ee6e1256d01fbb7257301ec965f86a966                                                                                                                      |
    +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    
  6. Generate a restic password

    ``` $ pwgen 100 1 # this is not the real password wai5woh2gaa0dio6OoSaudoc8AeKufee8Oof6nuNgie7zeitheiciat3aeleghie4Ooqu1juuC3ohroh2ipah9eegaeFurohphah ```
  7. Now, move to the target server and install restic

    # apt install -y restic
    
  8. Still on the target server, provide the credentials to restic:

    As a reminder, restic takes creds via envvars.

    $ cat spineimage.ca-restic.cfg
    export RESTIC_REPOSITORY=s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test
    export RESTIC_PASSWORD=wai5woh2gaa0dio6OoSaudoc8AeKufee8Oof6nuNgie7zeitheiciat3aeleghie4Ooqu1juuC3ohroh2ipah9eegaeFurohphah
    export AWS_ACCESS_KEY_ID=5f192d6ec2124fd3b23ad76036b31107
    export AWS_SECRET_ACCESS_KEY=f7828fa44b0a4fa186df7bb6608ff975
    $ . spineimage.ca-restic.cfg
    
  9. Create the backup repo

    $ restic init
    created restic repository c22ae4a3c4 at s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test
    
    Please note that knowledge of your password is required to access
    the repository. Losing your password means that your data is
    irrecoverably lost.
    $ restic snapshots  # test
    repository c22ae4a3 opened (repository version 2) successfully, password is correct
    

At this point, we only need to keep the restic config. We can toss the OpenStack RC file; if we need it again, we can grab it again.

Now how to interface Gitea with restic?

According to https://docs.gitea.io/en-us/backup-and-restore, backing up Gitea is the standard webapp process: dump the database and save the data folder,. It has a gitea dump command, but warns

Gitea admins may prefer to use the native the MySQL and PostgreSQL dump tools instead. There are still open issues when using XORM for dumping the database that may cause problems when attempting to restore it.

and I indeed ran into this while experimenting a few months ago: you cannot restore a gitea dump, especially if you've gone through a few Gitea versions since taking the backup, so I don't trust gitea dump; all it does is basically run pg_dump > data/gitea-db.sql and then zip the data/ folder. Plus, zipping is slow (it zips the repos!) and may actually make restic work worse by interfering with its own compression.

Also, restoring is a manual process:

There is currently no support for a recovery command. It is a manual process that mostly involves moving files to their correct locations and restoring a database dump.

So I'm going to ignore gitea dump and write my own backup script.

Limitations

From the docs:

Buckets are owned by the user who creates them, and no other user can manipulate them.

I can make tokens for everyone who will be adminning this server, but as far as ComputeCanada is concerned, all of them are me. I don't know how to handle this. Maybe when I eventually hand this off to someone else we'll have to copy all the backups to a new bucket.

Alternatives

According to restic, it can talk to OpenStack directly, without using the S3 protocol. But I got it working through S3 and I think that's fine.

@kousu kousu mentioned this issue Dec 5, 2022
@kousu
Copy link
Member Author

kousu commented Dec 6, 2022

Backups

My existing backup scripts on data.neuro.polymtl.ca (#20) look like

git@data:~$ cat ~/.config/restic/s3 
RESTIC_REPOSITORY=s3:s3.ca-central-1.amazonaws.com/data.neuro.polymtl.ca.restic
RESTIC_PASSWORD=xxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_ACCESS_KEY_ID=aaaaaaaaaaaaaaaaaaaa
AWS_SECRET_ACCESS_KEY=kkkkkkkkkkkkkkkkkkkk

git@data:~$ cat ~/.config/restic/CC
RESTIC_REPOSITORY=sftp://narval.computecanada.ca/:projects/def-jcohen/data.neuro.polymtl.ca.restic
RESTIC_PASSWORD=xxxxxxxxxxxxxxxxxxxxxxxxxxxx

git@data:~$ cat /etc/cron.d/backup-git 
# daily backups
0 2 * * *    git   (set -a; . ~/.config/restic/s3; cd ~; chronic restic backup --one-file-system repositories)
0 3 * * *    git   (set -a; . ~/.config/restic/CC; cd ~; chronic restic backup --one-file-system repositories)

# backup integrity checks
0 4 */3 * *  git   (set -a; . ~/.config/restic/s3; chronic restic check --read-data-subset=1/27)
0 5 */3 * *  git   (set -a; . ~/.config/restic/CC; chronic restic check --read-data-subset=1/9)

# compressing backups by pruning
0 6 * * *    git   (set -a; . ~/.config/restic/s3; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)
0 7 * * *    git   (set -a; . ~/.config/restic/CC; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)

That's for Gitolite. Porting this to Gitea is tricky because of the required downtime. This requirement seems to complicate everything because I want backups to run as gitea, but this forces part of the script to run as root; which means either the whole script needs to start as root then drop privileges, or start as gitea and use sudo to gain privileges, and it's always risky to try to do limited grants with sudo.

I'm tempted to ignore this requirement. I did some experiments and found that git annex sync --content transfers (which use rsync underneath) continue even after systemctl stop gitea, and git push transfers too, so there's no way to get a 100% consistent snapshot anyway.

Experiment
Peek.2022-12-05.19-10.webm

I'm going to compromise:

# daily backups
0 1 * * *    root   (systemctl stop gitea && su -c 'pg_dump gitea > ~gitea/gitea-db.sql' gitea; systemctl restart gitea)
0 2 * * *    gitea   (set -a; . ~/.config/restic/CC; cd ~; chronic restic backup --one-file-system gitea-db.sql data)

# backup integrity checks
0 4 */3 * *  gitea   (set -a; . ~/.config/restic/CC; chronic restic check --read-data-subset=5G)

# compressing backups by pruning
0 6 * * *    gitea   (set -a; . ~/.config/restic/CC; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)

This way, while the database and contents of data/ may drift a little apart, the worst that will happen is there are some commits in some repos that are newer than the database, or there are some avatars or other attachments that the database doesn't know about.

Ansible

I'm working on coding this up in https://github.com/neuropoly/computers/pull/434

EDIT: in that PR, I decided to ignore backup consistency. I think it will be okay. Only a very busy server would have problems anyway, which our servers will definitely not be, and I'm not even convinced it's that big a problem if the git repos and avatars are slightly out of sync with the database. And Gitea already has code to handle resyncing at least some cases, because digital entropy can always cause mistakes. I think at worst, it may fall back to using an older avatar for one person.

@kousu
Copy link
Member Author

kousu commented Feb 10, 2023

I merged neurogitea backups today and wanted to use them for this prod server. But first I had to

Ubuntu Upgrade

I used do-release-upgrade to upgrade drone.spineimage.ca and spineimage.ca.

There was a snag: the upgrade killed postgres-12 and replaced it with postgres-14; it sent me an email warning me to run pg_upgradecluster 12 main before continuing, but I ignored that and ran apt-get autopurge over eagerly. So I lost the database. 😢

Restore

Luckily, I had backups from December (taken above). I did apt-get purge postgresql-common, redeployed, and then followed my own docs to get it back:

root@spineimage:~# systemctl stop gitea
root@spineimage:~# su -l gitea -s /bin/bash
$ bash
gitea@spineimage:~$ restic-no arbutus snapshots
repository 2d22bf7f opened successfully, password is correct
ID        Time                 Host           Tags        Paths
---------------------------------------------------------------------------------
2547ebc9  2022-11-30 21:29:17  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

04e8abc1  2022-11-30 21:56:17  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

e66de5bb  2022-12-01 00:20:08  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

95325d1b  2022-12-01 00:20:29  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

4a530419  2022-12-01 00:20:57  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

ae839c6c  2022-12-01 00:26:27  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

197c4af8  2022-12-01 00:26:47  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

48f35777  2022-12-01 00:27:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

0ea0845d  2022-12-01 00:28:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

342d08dd  2022-12-01 01:18:25  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

221b5622  2022-12-01 01:20:02  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

4c3d1c67  2022-12-01 01:55:42  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

5b49742d  2022-12-01 01:56:52  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

8cc26371  2022-12-01 01:57:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

420e46d9  2023-02-09 23:02:48  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql
---------------------------------------------------------------------------------
15 snapshots
gitea@spineimage:~$ restic-no arbutus restore latest --include gitea-db.sql --target /tmp/r
repository 2d22bf7f opened successfully, password is correct
restoring <Snapshot 420e46d9 of [/srv/gitea/gitea-db.sql /srv/gitea/data] at 2023-02-09 23:02:48.947750225 -0500 EST by [email protected]> to /tmp/r
gitea@spineimage:~$ psql gitea < /tmp/r/gitea-db.sql
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
SET
SET
ERROR:  relation "access" already exists
[...]
ERROR:  relation "UQE_watch_watch" already exists
ERROR:  relation "UQE_webauthn_credential_s" already exists
gitea@spineimage:~$ exit

unfortunately I didn't take the backup with pg_dump --clean --if-exists, so I got all these errors. So I manually recreated an empty DB:

root@spineimage:~# sudo -u postgres psql 
psql (14.6 (Ubuntu 14.6-0ubuntu0.22.04.1))
Type "help" for help.

postgres-# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges   
-----------+----------+----------+---------+---------+-----------------------
 gitea     | gitea    | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
(4 rows)

postgres-# drop database gitea
postgres=# create database gitea with owner gitea;
CREATE DATABASE
postgres-# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges   
-----------+----------+----------+---------+---------+-----------------------
 gitea     | gitea    | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
(4 rows)

postgres-# \q

and reloaded again:

root@spineimage:~# su -l gitea -s /bin/bash
gitea@spineimage:~$ psql gitea < /tmp/r/gitea-db.sql
SET
[...]
CREATE INDEX
gitea@spineimage:~$ exit
root@spineimage:~# systemctl restart gitea

Gitea Upgrade

The redeploy above took care of upgrading Gitea, which went smoothly. It is now at 1.18.3+git-annex-cornerstone

Screenshot 2023-02-10 at 00-52-28 Neurogitea

and it does the inline previews (but I can't demo that here because it's private data).

@kousu
Copy link
Member Author

kousu commented Feb 17, 2023

I just noticed that ComputeCanada is affiliated with https://www.frdr-dfdr.ca/, sponsored by the federal government. They use Globus to upload data where we use git-annex. Should we consider recommending that instead of git?

They don't seem to do access control:

Anyone may use FRDR to search for and download datasets. You do not need to have a Globus Account affiliated with a Canadian postsecondary institution to download datasets in FRDR using Globus.

which rules them out for our use case. Also Globus doesn't do versioning, I'm pretty sure. For example, just look through https://www.frdr-dfdr.ca/discover/html/repository-list.html?lang=en and find, say, https://www.frdr-dfdr.ca/repo/dataset/6ede1dc2-149b-41a4-9083-a34165cb2537 and that doesn't show anything labelled "versions" as far as I can see.

@kousu
Copy link
Member Author

kousu commented Mar 7, 2023

Meeting - site_012

Today our site 12 curator was able to finish a dcm2bids config file and curate all subjects from her site, with David's advice. We included several t2starw images, that initially David thought we should drop until we realized they made up a large portion of Lisa's dataset.

One subject was dropped -- the previous sub-hal001 -- and the others subject IDs renumbered to start counting at sub-hal001.

There was also some debate about whether to tag sagittal scans with acq-sag or acq-sagittal. At poly we've used acq-sagittal but Ottawa's dataset uses acq-sag. At neuropoly we will standardize on this internally to match.

Right now curators have to manually run dcm2bids for each subject. @valosekj and @mguaypaq and I think it should be possible to write the loop for this, and write that into a script in each dataset's code/ folder. That would make curation more robust. @valosekj pointed out we can store the imaging IDs (the IDs used for each dataset's sourcedata/${ID}) in participants.tsv, using the 'source_id' column that we've standardized on. I think we can probably write a loop script that we can share between curators that reads their participants.tsv to know what sourcedata/ folder to look at (and maybe even patch dcm2bids with a batch mode that understands doing just that). That would be a lot more reliable than having curators run through the curation subject by subject every time they tweak the config file.

We have not yet run bids-validator on the dataset.

We spent a while trying to get the curator able to upload to spineimage.ca. I hoped we could start by committing and publishing the config file, and then adding subject data in steps, and refining the config file with more commits from there. I hoped it would be quick but we hit bugs and ran out of time for today. ssh is giving this error:

Screenshot_20230307_151447

We double-checked by using a few different servers and ports and also an PuTTY, an entirely separate program that should have nothing to do with the ssh that comes with Windows Git Bash

Screenshot_20230307_151723
Screenshot_20230307_151800

In all cases a closed port times out, an open port gives this error, after a successful TCP handshake. I remember the connection working back in July when we first were in touch with this site, but since then their hospital IT has done upgrades and now there seems to be some sort of firewall in the way. I will follow up with the curator by email with specific debugging instructions she can relay to her IT department.

@kousu
Copy link
Member Author

kousu commented May 22, 2023

Meeting - site_012

Today, because Halifax's IT department is apparently backlogged by months, we debugged further but without success. We created a short python script that basically just implements printf 'GET /\r\n\r\n' | nc $HOST $PORT and ran it against a few combinations or hosts and ports. In all cases, this worked, but when using an SSH client we are blocked. So SSH seems to be blocked as a protocol: when connecting over to port 443 on spineimage.ca via https://spineimage.ca:443, it worked, but when I reconfigured the server to run ssh on that same port and we tried both ssh://spineimage.ca:443 and sftp://spineimage.ca:443, it timed out; similarly, ssh -v reports "connected" before hanging and reporting "software caused connection abort" -- so the TCP connection is allow to port 22, but somehow to content of that connection is tripping it up. I suspect there is a deep-packet-inspecting firewall involved that is specifically blocking ssh 💢 .

As a workaround, we constructed a .tar.gz of the dataset and encrypted it using these tools https://stackoverflow.com/a/16056298/2898673. We resulted in a 1GB encrypted file, site_012.tar.gz.enc that's on Lisa's office machine. Mathieu and I have the decryption key saved securely on our machines. The curator is going to try to use a file hosting service NSHealth runs. If that fails, it might be possible to create an Issue in https://spineimage.ca/NSHA/site_012/issues and drag-and-drop the file onto it (so it becomes an attachment there), and in the worst case, it can be mailed on a thumbdrive to us at

Julien Cohen-Adad
Ecole Polytechnique, Genie electrique
2500, Chemin Polytechnique, Porte S-114
Montreal, QC
H3T 1J4
Canada

(ref: https://neuro.polymtl.ca/contact-us.html)

In the future, perhaps as a different workaround we can set up an HTTPS proxy; if the problem is a deep-packet inspecting firewall, wrapping the ssh connection in an HTTPS one should defeat it. I believe these instructions solve that https://stackoverflow.com/a/23616021/2898673 and we can pursue that in the future if/when we need to do this again.

EDIT: the curator created an account for us on https://sfts1.gov.ns.ca/ and sent us the encrypted file, and @ mguaypaq was able to download it and upload the contents to https://spineimage.ca. On Thursday, May the 18th, there was a final meeting to demonstrate to the Halifax curator that their work was done as far as it can be for now. If we need their help making edits to the data we will be right back here unless we figure out some kind of proxy situation.

@kousu
Copy link
Member Author

kousu commented May 23, 2023

Backups ("offsite")

Right now we only have two backups: one on spineimage.ca:/var, which doesn't really count as a good backup, and the one I created above, which, like the server, is on Arbutus in Victoria, and therefore one natural disaster could wipe out all the data. Moreover, there are not very many key holders -- just me at the moment -- and the data is stored inside of an OpenStack project owned by @jcohenadad, all of which makes neuropoly a single point of failure.

https://github.com/neuropoly/computers/blob/722545d38adc688fa621e4e25792371f36edd7fe/ansible/host_vars/spineimage.ca.yml#L2-L48

We should have other physical locations, to protect against natural disasters; the data sharing agreement requires us to stick to ComputeCanada as a line of defense against leaks, but since recently most of their clusters run OpenStack so we can choose a different physical location than arbutus.

We should also have other keyholders, ones who do not work for neuropoly so Praxis doesn't risk losing the data if we mess up or are attacked and get our accounts locked or wiped.


Towards all this I have been asking Praxis for help and they have found a keyholder. This person has been granted a separate ComputeCanada account and is ready to take on keyholding. They are apparently comfortable with the command line but don't have a lot of time to be involved, but they can hold the keys and, hopefully, bootstrap disaster recovery when needed.

Requesting Cloud Projects

In February, I emailed tech support, because despite seeing the list of alternate clouds, the sign up form doesn't provide a way to request one. They were extremely helpful about this:

To: "Nick Guenther" [email protected]
From: Jean-François Landry via Cloud Support [email protected]
Date: Fri, 17 Feb 2023 22:02:57 +0000

2023-02-17 16:08 (America/Toronto) - Nick Guenther wrote:

How may we request resources on cedar.cloud.computecanada.ca or
beluga.cloud.computecanada.ca? The google form at
https://docs.google.com/forms/d/e/1FAIpQLSeU_BoRk5cEz3AvVLf3e9yZJq-OvcFCQ-mg7p4AWXmUkd5rTw/viewform
doesn't allow choosing which cluster to use.

There is no specific option, just ask nicely in the free form description box.

Also, may we request a cloud allocation of only object storage? The form forces us
to allocate at least 1 VM and one 20GB disk and 1 IP. Allocating and not using a
virtual disk isn't that expensive for you, but allocating and not using an IP
address is quite so and I don't want to waste one.

You can. Again, no specific "object store only" cloud RAS allocation, just fill in the minimum for VCPU/RAM etc. and please explain in the free form description box.

You can get up to 10TB of object storage through cloud RAS.

They also added

There is no geo-distributed storage system period, but the Arbutus object store works great with restic (note that restic tried to pack chunks into 16MB minimum objects by default so it will not generate hundreds of millions of tiny objects). Also please update to the latest 0.15.1 release, the new v2 repo format is considered stable and does include zstd compression by default.

So I don't expect any problems requesting storage for backups from them. It sounds like they are familiar with and use restic all the time.

Lack of Existing Keyholders

I realized that for the existing backups, there is only one restic key credential and also probably only one s3 credential to go with it at the moment, the one used by the bot:

$ echo $RESTIC_REPOSITORY 
s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test2
$ restic key list
repository 2d22bf7f opened (version 1)
found 2 old cache directories in /home/kousu/.cache/restic, run `restic cache --cleanup` to remove them
 ID        User   Host           Created
----------------------------------------------------
*8bd433bf  gitea  spineimage.ca  2022-11-30 20:57:11
----------------------------------------------------

I am going to add s3+restic key credentials for:

I've done this by running

openstack ec2 credentials create -c access -c secret
PW=$(pwgen 100 1); echo "RESTIC_PASSWORD=$PW"
(echo $PW; echo $PW) | restic key add --user $name --host $institution.tld

for each person. I have the notes saved on /tmp and will be distributing them as securely as I can.

@kousu
Copy link
Member Author

kousu commented May 24, 2023

Backup Keyholder Onboarding

On Wednesday the 24th we are going to have a meeting with Praxis's nominee where we:

  1. Have them install restic

  2. Provide them restic credentials to the existing backups

  3. Test by having them do restic snapshots and restic ls latest

  4. Mention that restic disaster recovery docs are at https://restic.readthedocs.io/en/stable/050_restore.html

  5. Mention that the creds include s3 creds so they can be used with s3cmd or aws-cli

  6. Walk them through requesting a cloud project of their own.

    It should be on Graham, geographically separate from existing server/backups, and it doesn't need an IP address wasted on it. Here's the application form filled out with copy-pasteable answers:

    ~~Cloud Application Form~~
    • Request type: New project + RAS request

    • Project Type: persistent

    • Project name suffix: custom -> backup

    • VCPUs: 1

    • Instances: 1

    • Volumes: 1

    • Volume snapshots: 0

    • RAM: 1.5

    • Floating IPs: 1

    • Persistent storage: 20

    • Object storage: 1000

    • Shared filesystem storage: 0

    • Explain why you need cloud resources:

      I am working with a team hosting a research data server https://spineimage.ca on Arbutus that is looking for storage space for backups.

      We only need object storage. Please do not actually allocate any VMs, volumes, and especially no IP addresses for this.

      Please allocate the cloud project on Graham, so that a disaster at Arbutus will not risk our backups.

      Thank you!

    • Explain why the various Compute Canada HPC clusters are not suitable for your needs:

      The HPC clusters are primarily for compute, not storage.

    • Explain what your plan is for efficiently using the cloud resources requested:

      We are using restic, a deduplicating and compressing backup system, we do not need any compute
      resources allocated, only storage. We are requesting only as much storage
      as we have provisioned on the original server, and do not expect to fill either up at this time.

    • Describe your plans for maintenance and security upkeep:

      We do not intend to run a server under this cloud allocation.

    EDIT: we were misinformed.

    2023-05-24 16:01 Lucas Whittington via Cloud Support wrote:

    Unfortunately, Arbutus is the only Alliance cloud that provides object
    storage. Is is stored on separate machines from our volume cluster but won't
    protect you in the event of an incident that affects our entire data centre. Let
    me know if you would like to proceed.

    Instead, we will build our own file server on the other cluster. I'll either use minio or just sftp. Here's the updated request:

    Cloud Application Form
    • Request type: New project + RAS request

    • Project Type: persistent

    • Project name suffix: custom -> backup

    • VCPUs: 1

    • Instances: 1

    • Volumes: 1

    • Volume snapshots: 0

    • RAM: 1.5

    • Floating IPs: 1

    • Persistent storage: 1000

    • Object storage: 0

    • Shared filesystem storage: 0

    • Explain why you need cloud resources:

      I am working with a team hosting a research data server https://spineimage.ca on Arbutus that is looking for storage space for backups.

      Please allocate the cloud project on Graham, so that a disaster at Arbutus will not risk our backups.

      Thank you!

    • Explain why the various Compute Canada HPC clusters are not suitable for your needs:

      The HPC clusters are primarily for compute, not storage.

    • Explain what your plan is for efficiently using the cloud resources requested:

      We are using restic, a deduplicating and compressing backup system. We are requesting only as much storage
      as we have provisioned on the original server, and do not expect to fill either up at this time.

    • Describe your plans for maintenance and security upkeep:

      We will enable fail2ban, debian's automatic-upgrades, and install netdata as a alerting system.

  7. Leave them with instructions on how to generate and send us countervailing s3 creds

    S3 Credential Generation

    (based on https://docs.alliancecan.ca/wiki/Arbutus_object_storage)

    1. Install openstack

      • brew install openstackclient
      • apt install python3-openstackclient
      • otherwise: pip install python-openstackclient
    2. Login to https://graham.cloud.computecanada.ca

      Screenshot 2023-05-24 at 02-19-01 Connexion - OpenStack Dashboard

    3. Download the OpenStack RC File from under your profile menu in the top right corner

      openstack rc file

    4. Load it into your shell

      $ . /tmp/def-jcohen-dev-openrc.sh
      Please enter your OpenStack Password for project def-jcohen-dev as user nguenthe: [ CLOUD PASSWORD HERE ]
      
    5. Make S3 credentials:

      $ openstack ec2 credentials create -c access -c secret
      
      +--------+----------------------------------+
      | Field  | Value                            |
      +--------+----------------------------------+
      | access | 5390ea0b6d4001ccb1093c91b311e181 |
      | secret | 1f93ff01ddcae38594c5fcfceb24b850 |
      +--------+----------------------------------+
      

      'access' is AWS_ACCESS_KEY_ID and 'secret' is AWS_SECRET_ACCESS_KEY, as used by restic or s3cmd or awscli.

    We need s3 credentials generated for:

    Please forward the credentials privately to each individual keyholder. We will discuss at the meeting what the safest way to do that is.

  8. I will initialize the new repository and then hand out RESTIC_PASSWORDs to all the keyholders using pwgen 100 1.

@kousu
Copy link
Member Author

kousu commented Nov 7, 2023

Disk Latency Problem

I just came up against this after rebooting:

Nov 07 07:39:30 spineimage.ca systemd[1]: Finished systemd-networkd-wait-online.service - Wait for Network to be Configured.
Nov 07 07:39:31 spineimage.ca systemd[1]: dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.device: Job dev-disk-by\x2duuid-2067a784\x2d>
Nov 07 07:39:31 spineimage.ca systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.device - /dev/d>
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for systemd-fsck@dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.service ->
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for srv-gitea.mount - /srv/gitea.
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for gitea.service - Gitea (Git with a cup of tea).
Nov 07 07:39:31 spineimage.ca systemd[1]: gitea.service: Job gitea.service/start failed with result 'dependency'.
Nov 07 07:39:31 spineimage.ca systemd[1]: srv-gitea.mount: Job srv-gitea.mount/start failed with result 'dependency'.

i.e. /srv/gitea wasn't mounted, so gitea wasn't running. Can we make this more reliable somehow?

After a second reboot, it came up fine. So I don't know, maybe it was a fluke.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants