Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance when validating foreign key relationship on large dataset #805

Closed
louisjasek opened this issue May 24, 2018 · 12 comments
Closed
Labels
env:MacOS This issue is related to running Data Curator on macOS env:Windows This issue is related to running Data Curator on Windows est:Major Major effort to implement fn:Validation priority:High
Milestone

Comments

@louisjasek
Copy link

Current Behaviour (for problems)

When I validate a foreign key relationship on a large dataset, Data Curator stalls. This happens on Windows and Mac

Expected Behaviour

When I validate a foreign key relationship on a large dataset, Data Curator should handle it.

Steps to Reproduce

  1. Open Data Curator using the icon
  2. Download this CSV and remove the first row: https://data.qld.gov.au/dataset/coastal-data-system-near-real-time-storm-tide-data/resource/7afe7233-fae0-4024-bc98-3a72f05675bd
  3. Open the CSV in data curator
  4. Create a new tab with one column called 'H1'. Add values 'abellpoint' and 'weipanx'
  5. Save this new tab as a comma separated csv
  6. Guess columns and fill in mandatory table and package values
  7. On the coastal data table, establish a foreign key relationship between 'site' and 'h1' in the new tab you created/saved.
  8. Click validate
  9. Data Curator stalls

Your Environment

Data Curator version: 0.17.0
Operating System and version: Windows 10 Pro

AND

Data Curator version: 0.17.0
MacOS Sierra 10.12.6

@louisjasek
Copy link
Author

@Stephen-Gates again would you mind please adding priority and milestone. This is another high one for me. Would you also be able to test on your machine to see if you are getting the same result?

@Stephen-Gates Stephen-Gates added this to the v0.18.0 milestone May 24, 2018
@Stephen-Gates Stephen-Gates added env:MacOS This issue is related to running Data Curator on macOS env:Windows This issue is related to running Data Curator on Windows fn:Validation priority:High labels May 24, 2018
@louisjasek
Copy link
Author

louisjasek commented May 24, 2018

Here is another csv file that is causing this feature to stall Data Curator: https://data.qld.gov.au/dataset/crash-data-from-queensland-roads/resource/177dc50c-0cf7-46ba-8a69-99695aeaa46a

I created foreign keys on gender and severity columns, referencing new reference tables I created using a subset of unique values from these columns.

@Stephen-Gates
Copy link
Contributor

Repeated behaviour with tide data :-(

@Stephen-Gates
Copy link
Contributor

Explore upgrading to the latest version of HandsOnTable

@ghost
Copy link

ghost commented Jun 5, 2018

There is some delay due to number of rows to iterate and errors to send back, but we can use a loader here. However the frictionless iterations and Data Curator callbacks are all working fine. The stalling is due to the handsontable comments plugin not handling the large amount of feedback.
Will look at streaming/throttling feedback to plugin and will consider upgrading if time permits.

@Stephen-Gates
Copy link
Contributor

Thanks for the update @mattRedBox I remember you said upgrading handsontable was significant #678

@ghost
Copy link

ghost commented Jun 5, 2018

Have upgraded to 3.0.0 and tests are running correctly.
Just FYI, from what I can see in the commits to hot, the significant changes for us were:

  • cell selection (which we do frequently - fortunately hot have moved the functionality to another method which we could use in its place in our own code)
  • hot's search plugin initialisation - just some boilerplate changes, but obviously a big deal to us given the work we have put into find/replace. Fortunately, having the baseline find acceptance tests in place have helped with checking changes.

Unfortunately, from what I can see in the hot (Handsontable) library, even the latest version does not allow more than a single cell to register a hot comment (which we use to put those little marks in each cell to show validation errors) - which is a pity as they have changed the 'selection' code to enable multiple cell selections and reselections.

With all the other parts in place (streaming errors as they are received to cell so user has immediate feedback - just need to add a loading blockout - should be straightforward), I'll look at pinching the css from handsontable and implementing ourselves so that we can do 'batch' updates of cells, rather than 1 at a time - if successful, this should provide significant improvement in the wait times.

@ghost
Copy link

ghost commented Jun 6, 2018

Ok think I found a compromise within handsontable, similar to what I did in Find/Replace, where I can update array of cell objects, this time adding cell property change - then just need to render the entire hot, bypassing the specific setComment plugin call for an individual cell altogether.

@ghost ghost added the est:Major Major effort to implement label Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
ghost pushed a commit that referenced this issue Jun 6, 2018
@ghost ghost closed this as completed in c0ebdbf Jun 6, 2018
@Stephen-Gates
Copy link
Contributor

I repeated the first example in this issue. The validation completed and speed was acceptable.

screenshot 2018-06-07 06 25 39

screenshot 2018-06-07 06 22 17

Although the errors show in all cells using the triangle, some data doesn't show until you start scrolling.

The data being displayed in the cells is not correct. Only the first 1044 rows are for ablepoint.

screenshot 2018-06-07 06 33 31

@Stephen-Gates
Copy link
Contributor

Another example of the display error after opening the above data from a data package and then validating

screenshot 2018-06-07 07 18 02

@ghost ghost closed this as completed in 7a6fe5b Jun 7, 2018
@ghost
Copy link

ghost commented Jun 7, 2018

Hi @Stephen-Gates
Just needed to add back the default renderer to handsontable function we update to show the error colour in cell - will do another release tonight.

@Stephen-Gates
Copy link
Contributor

Confirmed working

screenshot 2018-06-16 07 35 37

ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
env:MacOS This issue is related to running Data Curator on macOS env:Windows This issue is related to running Data Curator on Windows est:Major Major effort to implement fn:Validation priority:High
Projects
None yet
Development

No branches or pull requests

2 participants