Duplicated Records from Odata pagination #288

victorygit · 2025-01-15T05:17:30Z

Hello,

We are leveraging this Pyodata library to extract data from SAP successfactors, recently we found we get some duplicate data from EC module during the project. We didn't realized this for other module, not sure if it is due to many project changes in the system, do we know if we have similar issue and how to avoid these duplicated records?

We are using server side pagination and I see the same records show up in different page download.

Thanks
Victor

phanak-sap · 2025-01-16T09:23:54Z

Hi @victorygit this is not much actionable thing on side of the library code. I do not have access to your service nor any kind of reproducible steps. You did not provide also any python snippet "_next" pagination? skip and top?) - which would be still without the service traffic just half of the problem.

I would recommend to first isolate if the problem is even on pyodata side or the duplicates truly exists in the service. The "not sure if it is due to many project changes in the system" shows me you do not know that actually.

I will consider that you have only options to use the API and not working in SAP. Still, it should be possible to check if the data are really duplicates (e.g. same primary key in two paginated response) or data that simply looks like duplicate data, but are actually different in primary keys (duplicates permitted in DB).

You can for example load al the data from the range you are paginating over, using curl or js sister library odata-library without the pagination, just to check for the existence of duplicates.

We need first to at least to distinguish if pyodata are returning something else from the service than other methods (therefore a bug) or is returning exactly the same things as any other API client is returning.

victorygit · 2025-01-16T12:57:05Z

I can confirm we are using server side pagination ("_next"), and the duplication does happen. I see the same records come from different page download files. I still need test the code more closely, however I do run the API in postman directly and there is no duplication. Hope it helps. Thanks Victor

…

On Thu, Jan 16, 2025, 4:24 AM Petr Hanák ***@***.***> wrote: Hi @victorygit <https://github.com/victorygit> this is not much actionable thing on side of the library code. I do not have access to your service nor any kind of reproducible steps. You did not provide also any python snippet "_next" pagination? skip and top?) - which would be still without the service traffic just half of the problem. I would recommend to first isolate if the problem is even on pyodata side or the duplicates truly exists in the service. The "not sure if it is due to many project changes in the system" shows me you do not know that actually. I will consider that you have only options to use the API and not working in SAP. Still, it should be possible to check if the data are really duplicates (e.g. same primary key in two paginated response) or data that simply looks like duplicate data, but are actually different in primary keys (duplicates permitted in DB). You can for example load al the data from the range you are paginating over, using curl or js sister library odata-library <https://github.com/SAP/odata-library> without the pagination, just to check for the existence of duplicates. — Reply to this email directly, view it on GitHub <#288 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXJPLKXT4MNXLZJOGAPED32K53EBAVCNFSM6AAAAABVGMR5XWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJUHE4DQMJVGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

phanak-sap · 2025-01-16T14:20:16Z

OK. If it would be possible, could you from your investigation create a failing test that would reproduce the issue?

E.g. start with

python-pyodata/tests/test_service_v2.py

Line 2239 in b6223d1

def test_partial_listing(service):

victorygit · 2025-01-23T03:09:49Z

I am not sure how to recreate this issue without connect to our server, here is what I am finding: I am using Odata to extract entity EmpEmployment, I count the record before I extract it and the number is 84484 which match the number when I run $count in the postman, then I extract the data by page and add all the pages to one dataframe, then I checked the data frame, the total count matches, however when I check the individual records, I found the duplication show in different page. any other testing or setting should I try? Thanks Victor

…

On Thu, Jan 16, 2025 at 9:20 AM Petr Hanák ***@***.***> wrote: OK. If it would be possible, could you from your investigation create a failing test that would reproduce the issue? E.g. start with https://github.com/SAP/python-pyodata/blob/b6223d1a88b85918fb0dbc9757767c880c84bcfe/tests/test_service_v2.py#L2239 — Reply to this email directly, view it on GitHub <#288 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXJPLPMGXITCD7DCD77BF32K653PAVCNFSM6AAAAABVGMR5XWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJVHA2TMNBVGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

phanak-sap · 2025-01-27T08:50:36Z

Hi Victor, I have received notification about

"Now I found it could due to the filter with lastModifiedDateTime, when I have filter as below, I got duplicated records
["lastModifiedDateTime gt datetimeoffset'1970-01-01T00:00:00Z' and lastModifiedDateTime le datetimeoffset'2025-01-26T23:34:26Z'"]
If I remove the filter, the duplication is gone. Does anyone see the similar issue? We are also use fromDate=1900-01-01 to get all the history information.
"

But I no longer seeing this comment. Is not not longer valid clue?

phanak-sap · 2025-01-27T08:53:04Z

Also, I expect that you are using standard requests library.
For better reproducibility (e.g. writing in the end failing test with mocked responses), look what is happening under the hood by enabling logging of actual HTTP reqeusts.

If you check that the log does not contains authorization information or any sensitive data, you can attach it to the issue as well.
Similarly, it would be helpful for comparison the log from the Postman where the duplication of results does not happen.

e.g. arcticle https://proxiesapi.com/articles/logging-and-debugging-with-requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated Records from Odata pagination #288

Duplicated Records from Odata pagination #288

victorygit commented Jan 15, 2025

phanak-sap commented Jan 16, 2025 •

edited

Loading

victorygit commented Jan 16, 2025 via email

phanak-sap commented Jan 16, 2025

victorygit commented Jan 23, 2025 via email

phanak-sap commented Jan 27, 2025

phanak-sap commented Jan 27, 2025 •

edited

Loading

Duplicated Records from Odata pagination #288

Duplicated Records from Odata pagination #288

Comments

victorygit commented Jan 15, 2025

phanak-sap commented Jan 16, 2025 • edited Loading

victorygit commented Jan 16, 2025 via email

phanak-sap commented Jan 16, 2025

victorygit commented Jan 23, 2025 via email

phanak-sap commented Jan 27, 2025

phanak-sap commented Jan 27, 2025 • edited Loading

phanak-sap commented Jan 16, 2025 •

edited

Loading

phanak-sap commented Jan 27, 2025 •

edited

Loading