Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro #67

Merged
merged 17 commits into from
Apr 27, 2020
Merged

Micro #67

merged 17 commits into from
Apr 27, 2020

Conversation

DavidGOrtega
Copy link
Contributor

Introduces the new architecture.

Still to do before merge:

  • Remove previous unneeded code (can be done in a separate PR)
  • Improve docs (can be done in a separate PR)

I have uploaded an image with this work to be tested:
dvcorg/dvc-cml:micro

with this workflow:

name: CML

on: [push, pull_request]

jobs:
  cml:
    runs-on: [ubuntu-latest]
    container:
      image: docker://dvcorg/dvc-cml:micro

    steps:
      - uses: actions/checkout@v2

      - name: dvc_action_run
        shell: bash
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

        run: |
          cml-setup

          apt-get update && apt-get install -y python-pip && pip install --upgrade pip
          pip install tensorflow wget

          dvc pull -f
          dvc repro train.dvc
          dvc push

          echo "# CML report" >> report.md
          dvc metrics diff --show-json origin/master HEAD | cml-metrics >> report.md
          dvc diff --show-json origin/master HEAD | cml-files >> report.md

          cml-send-report -p report.md 

@DavidGOrtega
Copy link
Contributor Author

@dmpetrov please consider what to be done with the code removal and or docs.
I think that what is missing in docs is an explanation of every cml command.

@DavidGOrtega DavidGOrtega requested a review from dmpetrov April 23, 2020 13:12
Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @DavidGOrtega! This design is much better than the monolith one.
Comments are inline.

README.md Outdated
test -f requirements.txt && pip3 install -r requirements.txt
# Run report:
dvc_cml_run
cml-setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to extract this code to bash level.

README.md Outdated
# Run report:
dvc_cml_run
cml-setup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace on: [push, pull_request] to one event. push is the best - we discussed this on slack.

README.md Outdated
cml-setup

apt-get update && apt-get install -y python-pip && pip install --upgrade pip
pip install tensorflow wget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go through the requirements file
test -f requirements.txt && pip3 install -r requirements.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is only my project

README.md Outdated
dvc_cml_run
cml-setup

apt-get update && apt-get install -y python-pip && pip install --upgrade pip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not python3? py2 was deprecated already. We should force the best practices :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its just my project, I have to update wiki with the new approach

README.md Outdated
pip install tensorflow wget

dvc pull -f
dvc repro train.dvc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docs we should use the most general case. So, dvc repro is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how, in the example it should be two steps train and eval. In fact in my example its wrong should be eval to run the whole pipeline. So the two steps pipeline should be train.dvc and Dvcfile?

src/settings.js Outdated
const DVC_TAG_PREFIX = '';
const CI_SKIP_MESSAGE = '[ci skip]';
const METRICS_FORMAT = '0[.][0000000]';
const BASELINE = 'origin/master';
const REPRO_TARGETS = ['Dvcfile'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore

src/settings.js Outdated
const DVC_TAG_PREFIX = '';
const CI_SKIP_MESSAGE = '[ci skip]';
const METRICS_FORMAT = '0[.][0000000]';
const BASELINE = 'origin/master';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the script level

const DVC_TAG_PREFIX = '';
const CI_SKIP_MESSAGE = '[ci skip]';
const METRICS_FORMAT = '0[.][0000000]';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI_SKIP_MESSAGE and METRICS_FORMAT - to the command option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not dooing skip anymore right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

src/settings.js Outdated
const DVC_TAG_PREFIX = '';
const CI_SKIP_MESSAGE = '[ci skip]';
const METRICS_FORMAT = '0[.][0000000]';
const BASELINE = 'origin/master';
const REPRO_TARGETS = ['Dvcfile'];
const A_REV = BASELINE;
const B_REV = 'HEAD';
const INPUT_SKIP = '-';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A_REV, B_REV, INPUT_SKIP - all on the script level

src/settings.js Outdated
const DVC_TAG_PREFIX = '';
const CI_SKIP_MESSAGE = '[ci skip]';
const METRICS_FORMAT = '0[.][0000000]';
const BASELINE = 'origin/master';
const REPRO_TARGETS = ['Dvcfile'];
const A_REV = BASELINE;
const B_REV = 'HEAD';
const INPUT_SKIP = '-';

exports.DVC_TITLE = DVC_TITLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script level

@DavidGOrtega
Copy link
Contributor Author

Settings is actually not used by the new architecture but the old one

@dmpetrov
Copy link
Member

Settings is actually not used by the new architecture but the old one

Right. This is why we need to remove it.

@DavidGOrtega
Copy link
Contributor Author

@dmpetrov Im going to remove the dvc-cml code in another PR if not this PR would be huge.

I have included the send report as comment in gitlab since tags were broken. Not broken but you needed to do lots of stuff manually that differs a lot with GH.

The work can be tested with the container at:
dvcorg/dvc-cml:micro

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvements on the comment and cleaning code. But the major issues still need to be addressed.

src/cml.js Outdated
@@ -1,17 +1,17 @@
const print = console.log;
// console.log = console.error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code? Please remove all the commented code without a good reason.

Copy link
Contributor Author

@DavidGOrtega DavidGOrtega Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outdated... The code that I committed don't have it, maybe an issue in GH UI?

dvc_cml_run

# needed to be able to do dvc metrics and dvc diff
git fetch --prune --unshallow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it a part of actions/checkout@v2? I hope it supports all these options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do as stated by them:

- uses: actions/checkout@v2
- run: |
    git fetch --prune --unshallow

but I think is easier to loose the track. What do you think? should I move it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it in their docs. yep, it looks good, let's keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have understood to keep it in our side and not put it in the checkout action. Also one point to remember is the gitlab similarity. GL does not have that action

README.md Outdated
dvc push

echo "# CML report" >> report.md
dvc metrics diff --show-json origin/master HEAD | cml-metrics >> report.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not to extract BASELINE as a variable?

bin/cml-files.js Outdated

const { metrics_args, diff_run, error_handler } = require('../src/cml');
const opts = metrics_args();
diff_run(opts).catch(e => error_handler(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the code needs to be extracted from scm/cml (except common libs).
Each command is supposed to be as independent as possible. It should simplify command replacement for users.


const { metrics_args, metrics_diff_run, error_handler } = require('../src/cml');
const opts = metrics_args();
metrics_diff_run(opts).catch(e => error_handler(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Please move all then needed code from src/cml to here.

} = require('../src/cml');

const opts = send_report_args();
send_report_run(opts).catch(e => error_handler(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

@@ -0,0 +1,5 @@
#!/usr/bin/env node

const { setup_env_remote, error_handler } = require('../src/cml');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -0,0 +1,36 @@
const fs = require('fs');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why do we need this module. We should just read stdin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read the buffer, stdin buffer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need a buffer? you can read everything at once. if it fails - totally fine since our limit is way below any machine memory restriction.

I assume there is a 1-3 line of JS code that can do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is a 1-3 line of JS code that can do this.

No, I dont think so, code was much simpler before but with DVC output breaks. I can try a library like https://www.npmjs.com/package/get-stdin
but the code is prettty similar to mine using buffer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. let's keep it for now

@DavidGOrtega
Copy link
Contributor Author

DavidGOrtega commented Apr 27, 2020

Is there a reason to not to extract BASELINE as a variable?

Whats the benefit? You are setting it on there or in the variable all that its in the yaml file. I dont see the difference and its worse since we handle code that its not needed. Maybe Im losing the point

@dmpetrov
Copy link
Member

Whats the benefit?

The point:

  1. Many commands use the same baseline - avoid code duplication.
  2. This additional line of code will serve as documentation.

@DavidGOrtega
Copy link
Contributor Author

OK,

in the bash or in the ENV?

BASELINE="origin/master"
dvc metrics diff --show-json "$BASELINE" HEAD

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change @DavidGOrtega
One question inline - please answer

README.md Outdated
dvc metrics diff --show-json origin/master HEAD | cml-metrics >> report.md
dvc diff --show-json origin/master HEAD | cml-files >> report.md
dvc metrics diff --show-json "$BASELINE" HEAD | cml-metrics >> report.md
dvc diff --show-json "$BASELINE" HEAD | cml-files >> report.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to compare it to HEAD? It means the results need to be committed.
Based on my understanding it should look like dvc diff --show-json "$BASELINE" | ....
Could you please make sure which approach is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants