Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing the same dat file to Controller_Event_Log table causes duplicate table entries #221

Open
dbrandesky opened this issue Jun 22, 2023 · 2 comments

Comments

@dbrandesky
Copy link

I'm running ATSPM 4.2.4 currently as I have a bit of an older server setup and 4.3.1 wouldn't work. I am also using my own script to retrieve dat files from EOS controllers, placing them into folders named with the signal ID. I have no issues importing the data using NewDecodeandImportASC3Logs. However if I forget to remove the dat files from the signal ID directory once they've already been imported and I run NewDecodeandImportASC3Logs again over those same files I start seeing weird behavior in the signal reports. Most notably the Purdue Phase Termination chart shows "Unknown" for basically all of the phase terminations that coincide with SPM data that was imported more than once.

I didn't dive too deep into the ATSPM code as I'm not a C# guy, but I do know in SQL it's quite possible to INSERT to a table conditionally, checking to make sure the current row doesn't already exist in that table before inserting the row. Is there a reason this was not done in ATSPM? I specifically do not want the dat files deleted from my controllers so that means I have to manually manage which files get brought into my ATSPM instance to avoid duplication.

@dloweAvenueConsultants
Copy link
Collaborator

We used to do a unique constraint on the table. This is something you could still try but our dataset is so large that it could not do the unique check and keep up with the inserts.

Using temp tables was also considered but the solution we looked at was more of a daily combine with the production table. This did not meet the requirement to get the data as close to real time as is possible.

Originally when we were trying to address this with the charts it was determined that in the case of an occasional duplicate it shouldn't affect the chart too much. At the time we did not have as many charts and I can see how this would be a bigger issue now due to many charts looking for events in a specific order. Duplicates would cause these to fail with Phase Termination and categorize everything as Unkown. You could write a script to clean the duplicates but I don't currently have anything.

This will be addressed in version 5.0. The check will be done in memory as part of the compressed object.

@dbrandesky
Copy link
Author

dbrandesky commented Jun 22, 2023

Thanks for the quick answer. I can appreciate how this gets far more complicated over a large dataset.

For the sake of others with the same issue, I did some googling and found a way to find and delete duplicates using common table expressions (CTE) in a Microsoft SQL query:

WITH CTE([SignalID], 
         [Timestamp], 
         [EventCode],
         [EventParam], 
         DuplicateCount)
AS (SELECT [SignalID], 
           [Timestamp], 
           [EventCode],
		   [EventParam], 
           ROW_NUMBER() OVER(PARTITION BY [SignalID], 
										  [Timestamp], 
										  [EventCode],
										  [EventParam]
           ORDER BY Timestamp) AS DuplicateCount
    FROM [MOE].[dbo].[Controller_Event_Log])
DELETE FROM CTE
WHERE DuplicateCount > 1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants