Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.4-beta1: YAML import raises AttributeError exception #11000

Closed
peteeckel opened this issue Nov 22, 2022 · 10 comments
Closed

v3.4-beta1: YAML import raises AttributeError exception #11000

peteeckel opened this issue Nov 22, 2022 · 10 comments
Assignees
Labels
beta Concerns a bug/feature in a beta release status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application

Comments

@peteeckel
Copy link
Contributor

NetBox version

v3.4-beta1

Python version

3.8

Steps to Reproduce

  1. Click on "IP Addresses", "Import"
  2. Select YAML as the input format
  3. Paste the following valid YAML into the form field:
- address: 10.0.1.1/24
  status: active

Expected Behavior

The data is imported and a new IP address appears in the database

Observed Behavior

An AttributeError exception is raised:

Environment:

Request Method: POST
Request URL: http://192.168.106.105/ipam/ip-addresses/import/

Django Version: 4.1.2
Python Version: 3.8.11
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'django.contrib.humanize',
 'corsheaders',
 'debug_toolbar',
 'graphiql_debug_toolbar',
 'django_filters',
 'django_tables2',
 'django_prometheus',
 'graphene_django',
 'mptt',
 'rest_framework',
 'social_django',
 'taggit',
 'timezone_field',
 'circuits',
 'dcim',
 'ipam',
 'extras',
 'tenancy',
 'users',
 'utilities',
 'virtualization',
 'wireless',
 'django_rq',
 'drf_yasg',
 'netbox_dns.DNSConfig']
Installed Middleware:
['graphiql_debug_toolbar.middleware.DebugToolbarMiddleware',
 'django_prometheus.middleware.PrometheusBeforeMiddleware',
 'corsheaders.middleware.CorsMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'django.middleware.security.SecurityMiddleware',
 'netbox.middleware.ExceptionHandlingMiddleware',
 'netbox.middleware.RemoteUserMiddleware',
 'netbox.middleware.LoginRequiredMiddleware',
 'netbox.middleware.DynamicConfigMiddleware',
 'netbox.middleware.APIVersionMiddleware',
 'netbox.middleware.ObjectChangeMiddleware',
 'django_prometheus.middleware.PrometheusAfterMiddleware']



Traceback (most recent call last):
  File "/opt/netbox/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/opt/netbox/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/netbox/lib/python3.8/site-packages/django/views/generic/base.py", line 103, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/netbox/netbox/netbox/views/generic/base.py", line 13, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/opt/netbox/netbox/utilities/views.py", line 99, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/opt/netbox/lib/python3.8/site-packages/django/views/generic/base.py", line 142, in dispatch
    return handler(request, *args, **kwargs)
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 442, in post
    new_objs = self.create_and_update_objects(form, request)
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 364, in create_and_update_objects
    prefetch_ids = [int(record['id']) for record in records if record.get('id')]
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 364, in <listcomp>
    prefetch_ids = [int(record['id']) for record in records if record.get('id')]

Exception Type: AttributeError at /ipam/ip-addresses/import/
Exception Value: 'list' object has no attribute 'get'
@peteeckel peteeckel added the type: bug A confirmed report of unexpected behavior in the application label Nov 22, 2022
@jeremystretch jeremystretch added the beta Concerns a bug/feature in a beta release label Nov 22, 2022
@jeremystretch
Copy link
Member

I'm curious whether the expectation from the user perspective is to provide a list of YAML dictionaries, or separate YAML documents representing each object. The YAML loader expects a list of individual documents (separated with ---).

@jeremystretch jeremystretch added the status: under review Further discussion is needed to determine this issue's scope and/or implementation label Nov 22, 2022
@peteeckel
Copy link
Contributor Author

peteeckel commented Nov 22, 2022

Obviously my expectation was to be able to provide the data in a single YAML array :-)

Anyway, after looking at the code I also tried

- address: 10.0.1.1/24
  status: active
---
- address: 10.0.1.2/24
  status: active

(with various different locations of the separators). Always with the same result.

The problem seems to originate in the fact that whether you provide a single YAML document or a list of multiple documents (of which a list containing one document as I did in my minimalised exaxmple), yaml.load_all() always returns the parsed data itself as a list containing one or more lists of dictionaries:

- address: 10.0.1.0/24
  status: active
- address: 10.0.1.1/24
  status: active

[[{'address': '10.0.1.0/24', 'status': 'active'}, {'address': '10.0.1.1/24', 'status': 'active'}]]
- address: 10.0.1.0/24
  status: active
---
- address: 10.0.1.1/24
  status: active

[[{'address': '10.0.1.0/24', 'status': 'active'}], [{'address': '10.0.1.1/24', 'status': 'active'}]]

Iterating over the result will always give lists as items, and obviously .get() won't work on them.

Personally, I'd flatten the list and iterate over all dictionaries, whether they come in one or several YAML documents. I can't really see the advantage of one or the other structure (maybe I'm missing something, though). I normally provide YAML data in one large document, but there may be data sources that use the multi-document approach. It would be sensible to be able to handle both.

@kkthxbye-code
Copy link
Contributor

Just for clarity, the following is the format currently supported:

address: 10.0.1.1/24
status: active
---
address: 10.0.1.2/24
status: active

@peteeckel
Copy link
Contributor Author

Hm. That is probably not what anyone would expect when YAML is mentioned. It definitely is a very special case that does not make too much sense to me.

What I would expect in the first place is a list in one document. That's what most tools using YAML export eject, and it's the canonical output when you give yaml.dump with a datastructure roughly equivalent to a CSV:

#!/usr/bin/env python3

import yaml

struct = [
    { "test1": 1, "test2": 2 },
    { "test1": 2, "test2": 4 },
]

print(yaml.dump(struct))

results in

- test1: 1
  test2: 2
- test1: 2
  test2: 4

The multi-document approach is nothing I see very often in everyday YAML practice. It doesn't hurt to support it, but I obviously didn't even have the idea of trying it. The currently supported notation does, however definitely make more sense than the multi-document form of one list element per document I tried.

Anyway, I'd still go for a list of dictionaries.

@peteeckel
Copy link
Contributor Author

peteeckel commented Nov 23, 2022

I have a suggestion for a minimal code change that would support at least the three variants discussed here:

diff --git a/netbox/utilities/forms/forms.py b/netbox/utilities/forms/forms.py
index 5756cf0e3..c361f13c3 100644
--- a/netbox/utilities/forms/forms.py
+++ b/netbox/utilities/forms/forms.py
@@ -207,12 +207,24 @@ class ImportForm(BootstrapMixin, forms.Form):
 
     def _clean_yaml(self, data):
         try:
-            return yaml.load_all(data, Loader=yaml.SafeLoader)
+            records = list(yaml.load_all(data, Loader=yaml.SafeLoader))
         except yaml.error.YAMLError as err:
             raise forms.ValidationError({
                 self.data_field: f"Invalid YAML data: {err}"
             })
 
+        try:
+            records = sum(records, [])
+        except TypeError:
+            pass
+
+        if any(type(record)!=dict for record in records):
+            raise forms.ValidationError({
+                self.data_field: "Invalid YAML data: A list of dictionaries is required"
+            })
+
+        return records
+
 
 class FilterForm(BootstrapMixin, forms.Form):
     """

The above code can handle three variants of YAML structure:

variant1 = '''
- address: 10.0.1.1/24
  status: active
- address: 10.0.1.2/24
  status: active
- address: 10.0.1.3/24
  status: active
'''

variant2 = '''
- address: 10.0.1.1/24
  status: active
- address: 10.0.1.2/24
  status: active
---
- address: 10.0.1.3/24
  status: active
'''

variant3 = '''
address: 10.0.1.1/24
status: active
---
address: 10.0.1.2/24
status: active
---
address: 10.0.1.3/24
status: active
'''

Update: It was still possible to input valid YAML that caused Exceptions. The code now makes sure that what is returned is a list of dictionaries.

@peteeckel
Copy link
Contributor Author

peteeckel commented Nov 23, 2022

What is feeding my argument as well is that the JSON importer accepts the following data structure:

[
    {
        "address": "10.1.1.1/24",
        "status": "active"
    },
    {
        "address": "10.1.1.2/24",
        "status": "active"
    }
]

This is the exact equivalent of the YAML structure

- address: 10.1.1.1/24
  status: active
- address: 10.1.1.2/24
  status: active

So it really makes sense to support it in addition to the multi-document variant that js supported at the moment. Given the fact that it is currently possible to submit valid YAML structures that cause exceptions trying to parse them _clean_yaml() needs to be revised anyway.

@kkthxbye-code
Copy link
Contributor

kkthxbye-code commented Nov 24, 2022

it's the canonical output when you give yaml.dump with a datastructure roughly equivalent to a CSV:

While I don't have strong feelings either way, if you use yaml.dump_all() it matches the output netbox expects. Which makes sense as we load using yaml.load_all(). Personally I think it would be best to specify one format, but I don't have any good arguments for or against.

Whatever we decide, we should at least support the format that the DeviceType export function exports in, which is currently multiple yaml documents via. dump_all(), so it needs to be changed if we change the format.

@peteeckel
Copy link
Contributor Author

I made an experiment, just to check my own perception (sometimes you live in a strange bubble), and asked a few colleagues what they think of first when they should represent the data structure of multiple records with specific fields as a YAML structure. All of them came up with the 'list of dictionaries' idea, some of them didn't even know that the multi-document approach exists (I did, but it was stored away in some very remote attic of my brain).

What's the rationale behind using dump_all()? Is there any use case that makes dump() impractical or less obvious? I'm trying to figure out the advantage of using what I feel is a slightly exotic format.

Anyway, IMHO it makes sense to be tolerant in what you accept and strict in what you emit - so I'd really prefer the approach outlined above of accepting any structure that represents the record structure that needs to be imported.

By the way, I tried to import

{
    "address": "10.1.5.1/24",
    "status": "active"
},
{
    "address": "10.1.5.2/24",
    "status": "active"
}

(with and without the comma between the dictionaries) as JSON input - both were not accepted. So in JSON the only valid form structure is what would, when parsed as YAML (which is possible, JSON usually is valid YAML as well) be represented as

- address: 10.1.5.1/24
  status: active
- address: 10.1.5.2/24
  status: active

@jeremystretch
Copy link
Member

Hm. That is probably not what anyone would expect when YAML is mentioned. It definitely is a very special case that does not make too much sense to me.

This is the same way the device type YAML import has worked for years.

@peteeckel
Copy link
Contributor Author

This is the same way the device type YAML import has worked for years.

Never mind - I can adjust to that, my point is just that IMHO it is a quite unusual variant of YAML.

The main point of this issue still remains - getting the input format wrong while providing a valid YAML structure should not cause an exception later in the process, but a ValidationError while cleaning the input.

The current code only checks for YAML validity (a test that all of my input variants passed) in _clean_yaml and then an exception is raised later when the parsed data structure does not match the implicitly expected format.

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation labels Dec 2, 2022
jeremystretch added a commit that referenced this issue Dec 2, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beta Concerns a bug/feature in a beta release status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Projects
None yet
Development

No branches or pull requests

4 participants