Skip to content
This repository has been archived by the owner on Apr 17, 2023. It is now read-only.

crono synch job fails on private repository and deletes everything from database #663

Closed
sshipway opened this issue Jan 14, 2016 · 10 comments
Labels
Milestone

Comments

@sshipway
Copy link

I have Portus running with the webhook working correctly, and API account created. Authentication in registry 2.1 is passed on to Portus which uses LDAP.

When the crono Catalogue synch job runs, it removes all the repositories from the database.

The synch job used to work, until another user logged in, created a group, and a private repository.

Following logs, I can see that what is happening is this:

  • The Catalog job runs on the registry
  • All repositories are correctly identified {"repositories":["registry","sis/rea","sshi052/janitor","sshipway/minecraft-server","sshipway/portus","ubuntu"]}
  • It tries to identify tags, which works for public repositories
  • When calling on the private repository it receives a 404 with content {"errors":[{"code":"NAME_UNKNOWN","message":"repository name not known to registry","detail":{"name":"sis/rea"}}]}
  • This causes the function to immediately exit with an empty array
  • This causes it to synch against an empty array
  • Which causes all the database entries to be deleted.

This seems to have two big problems...

  • First, add_tags should not return immediately if tags are not found for a single repo
  • Secondly, why is it able to identify the repo exists, but not the tags? Is Portus not authenticating back using its own portus account? Is there an issue because I am using LDAP?
@sshipway
Copy link
Author

A solution for the firs part seems to be to do this change in add_tags in lib/portus/registry_client.rb

    def add_tags(repositories)
      return [] if repositories.nil?

      result = []
      repositories.each do |repo|
        res = perform_request("#{repo}/tags/list")
        if res.code.to_i == 200
          result << JSON.parse(res.body)
        end
      end
      result
    end

I would submit this as a pull req but there are a fairly large number of requirements to do that, so I'll leave it for one of the devs to pick up

@mssola
Copy link
Collaborator

mssola commented Jan 15, 2016

First, add_tags should not return immediately if tags are not found for a single repo

You are totally right. I'll fix that.

Secondly, why is it able to identify the repo exists, but not the tags? Is Portus not authenticating back using its own portus account? Is there an issue because I am using LDAP?

Now, this is weird... If you have access to the Portus instance, maybe you could call:

$ rails runner bin/client.rb catalog

What does this print ? Moreover, just to be sure: which version of docker and docker distribution are you using ?

@mssola mssola added the bug label Jan 15, 2016
@sshipway
Copy link
Author

I will run this test Monday.

We are running docker under CoreOS, docker 1.8.3

@wolfch
Copy link

wolfch commented Jan 20, 2016

I am seeing the same issue. If I docker exec into the db container and connect to mysql the "repositories" and "tags" tables are empty. I am on RHEL-7.2 and am running docker:
docker-engine-1.9.1-1.el7.centos.x86_64
docker-engine-selinux-1.9.1-1.el7.centos.noarch

if I run the "_catalog" REST API on the registry, I see the images in there. I also was using LDAP, but I tried without LDAP first.

@sshipway
Copy link
Author

@mssola Sorry for the delay, ran the test as asked. Unfortunately there have been a number of changes ot the repository since my first post (people are actively testing things) so the values are somewhat different to before. The 'fos' namespace is now the private one (same as 'sis' in the original example).
We are now upgraded to Docker 1.9.1, CoreOS 899.
The tags for the fos/rea repo are being shown in the output of the test command as expected.

root@d361129d08a4:/portus# rails runner bin/client.rb catalog
WARNING: Nokogiri was built against LibXML version 2.9.1, but has dynamically loaded 2.8.0
[{"name"=>"fos/rea", "tags"=>["latest"]},
 {"name"=>"registry", "tags"=>["2.1.1", "2"]},
 {"name"=>"sshi052/janitor", "tags"=>["latest"]},
 {"name"=>"sshipway/minecraft-server", "tags"=>["latest"]},
 {"name"=>"sshipway/portus", "tags"=>["2.0.0", "dev", "latest"]},
 {"name"=>"ubuntu", "tags"=>["latest"]}]
"Size: 6"
root@d361129d08a4:/portus#

mssola added a commit to mssola/Portus that referenced this issue Jan 21, 2016
Right now, the `RegistryClient#catalog` method can erase the DB of all the
repos. This happens when, for some unexpected reason, the
`RegistryClient#add_tags` method fails at retrieving one repository. In this
case, before this patch this method just returned an empty array. After this
patch, repositories that are not found will simply not be added, but the
method will go on adding tags to other repositories.

See SUSE#663

Signed-off-by: Miquel Sabaté Solà <[email protected]>
@mssola
Copy link
Collaborator

mssola commented Jan 21, 2016

@sshipway sorry for the delay from my side too ;) With #672 the first part will be fixed. Moreover, it will be backported to the v2.0 branch, so an eventual (and rather soon) version 2.0.1 of Portus should include this fix.

So, as far as I understand it, the rails runner bin/client.rb catalog command gets it right, or ? Could you take a look at the logs ? Besides that, I'll do some investigation of my own :)

@sshipway
Copy link
Author

Thanks for the update. I've added some more logging lines to log the JSON from catalogue requests when in debug mode so that there will be more info.
Our current PoC setup is changing rapidly, with the registry setup often being rebuilt, which makes it difficult to provide consistent information. For some reason, only certain private repositories are affected, and currently things are working correctly (I hate this sort of bug). However, when I can reliably repeat the problem I'll obain the same logs and post again.
Looking forward to upgrading to 2.0.1

mssola added a commit that referenced this issue Jan 22, 2016
Right now, the `RegistryClient#catalog` method can erase the DB of all the
repos. This happens when, for some unexpected reason, the
`RegistryClient#add_tags` method fails at retrieving one repository. In this
case, before this patch this method just returned an empty array. After this
patch, repositories that are not found will simply not be added, but the
method will go on adding tags to other repositories.

See #663

Signed-off-by: Miquel Sabaté Solà <[email protected]>
mssola added a commit to mssola/Portus that referenced this issue Jan 22, 2016
Right now, the `RegistryClient#catalog` method can erase the DB of all the
repos. This happens when, for some unexpected reason, the
`RegistryClient#add_tags` method fails at retrieving one repository. In this
case, before this patch this method just returned an empty array. After this
patch, repositories that are not found will simply not be added, but the
method will go on adding tags to other repositories.

See SUSE#663

Signed-off-by: Miquel Sabaté Solà <[email protected]>
@mssola mssola added this to the Provisioning & General Usage milestone Feb 23, 2016
@boyand
Copy link

boyand commented Mar 29, 2016

A similar thing could have happened to us recently. We are using 2.0.3 and docker 1.9. All of a sudden we lost all our repositories and the whole database was wiped out. Unfortunately cannot provide more debugging info at this stage but I guess the merge above did not (fully) fix the problem.

What would be the best way to turn debugging for crono?

Also I have found that crono is performing quite slow. We have under 100 repositories with probably 15-20 tags each and it takes about 7 minutes to go through the catalogue job. Is there anything that can be done to improve on that?

@mssola
Copy link
Collaborator

mssola commented Mar 30, 2016

A similar thing could have happened to us recently. We are using 2.0.3 and docker 1.9. All of a sudden we lost all our repositories and the whole database was wiped out. Unfortunately cannot provide more debugging info at this stage but I guess the merge above did not (fully) fix the problem.

What version of the registry are you using ?

What would be the best way to turn debugging for crono?

Crono uses the log level as for Portus itself. So, if you are in development mode, then it follows the :debug level. Otherwise, if you are in production, you may want to tweak the log level in config/environment/production.rb by changing in config.log_level from :info to :debug. After that, restart the crono service (remember to set it back to :info once you're done debugging, since it may produce a lot of data :P). Now you should be able to see more stuff on the logs.

Also I have found that crono is performing quite slow. We have under 100 repositories with probably 15-20 tags each and it takes about 7 minutes to go through the catalogue job. Is there anything that can be done to improve on that?

This is something that we should investigate. Maybe Portus is to blame, maybe the registry (e.g. slow backend), etc.

@mssola mssola modified the milestones: Before the 2.1 release, Provisioning & General Usage Aug 9, 2016
@mssola mssola modified the milestones: Priorities for 2.2, Release 2.3 Aug 2, 2017
@mssola
Copy link
Collaborator

mssola commented Jan 18, 2018

Closing in favor of #1599.

@mssola mssola closed this as completed Jan 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants