-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source MongoDB: Failed to fetch Schema #14246
Comments
Could be possible that this is because of the large number of collections? Could you help us more information on
|
This has been happening to me for quite some time, roughly 40 collections. Old Ruby connector doesnt have this issue |
Actually I can no longer get schema on either version. Both are still funning loads jut fine, schema discovery fails |
@JCWahoo I can't reproduce this issue. I created a dataset of 40 collections and 100,000 documents in each, and the schema discovery is successful |
@VitaliiMaltsev We have more than 40 collections, some with millions of documents. No log messages are returned when using the v2 connector so I've got nothing to go on. The old Ruby connector returns an error around using mapReduce on a view when validating schema... Not sure if that is helpful or not. When I watch Mongo during schema discovery in v2, it appears the connector is retrieving more than 10k documents for schema evaluation |
@JCWahoo just tested a dataset of 100 collections and 1.5 miliions documents in each, and the schema discovery is still successful |
@VitaliiMaltsev I know, its frustrating not having any error in the logs. It's an Atlas cluster that I'm connecting to the replica shard in standalone mode. The collections have several layers of nesting within documents. Happy to provide any more detail I can, I've been blocked on making any updates to the Mongo connection because the schema discovery continues to fail. The old connector and new connector are still pulling data hourly without issue, I just cant change them. |
@JCWahoo in that case could you please provide me with more information about your source so i can try to reproduce it again
|
@VitaliiMaltsev Sure thing. For simplicity/privacy we'll call the database "Production". There are 80 collections, largest one is approx ~50 million documents. I'm using production as the database name for sync and admin as the authentication source. My user has readAnyDatabase @ admin and read @ local permissions in Mongo. Largest collection has around 20 fields, some with several layers of nesting such as
|
Please provide an example of the document as json with all levels of nesting with the same field names as your largest collection |
Can I email it to you rather than here? |
Sure, my email is [email protected] |
@JCWahoo I tried to reproduce this problem with all your recommendations (80 collections, the biggest one with 50 million documents and the same data structure as you sent me) with no success. In my env schema discover works well
|
Great news, thanks!
…On Wed, Oct 5, 2022, 12:23 PM VitaliiMaltsev ***@***.***> wrote:
@JCWahoo <https://github.com/JCWahoo> I tried to reproduce this problem
with all your recommendations (80 collections, the biggest one with 50
million documents and the same data structure as you sent me) with no
success. In my env schema discover works well
On the other hand, I found 2 potential bottlenecks that could lead to this
problem.
1. Potential issue with Mongo Atlas latency
Try changing the region of your cluster as described here
https://www.mongodb.com/docs/atlas/tutorial/move-cluster/
2. For each collection, the formation of the final result that will be
displayed in UI is done using a normal loop, which can be quite slow if you
have many collections / documents
I created a PR <#17614> to
parallelize this process, which will significantly increase the performance
Hope this helps :)
—
Reply to this email directly, view it on GitHub
<#14246 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUAW7TFL6E5ENX7WFHZM7G3WBWTPRANCNFSM52FNYLWQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@JCWahoo i just released mongodb-source version 0.1.19. Please check it out and try to refresh schema once more |
Still no luck, will share over logs and what it looks like on the Mongo
side.... appears to be getting stuck on nested array/object
…On Fri, Oct 7, 2022 at 7:23 AM VitaliiMaltsev ***@***.***> wrote:
@JCWahoo <https://github.com/JCWahoo> i just released mongodb-source
version 0.1.19. Please check it out and try to refresh schema once more
—
Reply to this email directly, view it on GitHub
<#14246 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUAW7TBAMEM6FLSMEKKQ4FTWCAB3XANCNFSM52FNYLWQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@JCWahoo I have one more assumption that your collections contain documents that have a different structure within the same collection. This may be causing your issue. |
Thanks - Any docs/guidance on that last bit? Not sure I've done that before, or its been so long I dont recall |
i believe our teem need to implement this task so that you can choose this option yourself on UI |
I created source (MongoDB) and destination (BigQuery), checked connection - on this step all ok. After, i setup new connection, and started fetch data schema from mongo, this operation lasted about 30 min, and failed. Can you pleae tell is this due to large number of collection or data format issue on mongo side
The text was updated successfully, but these errors were encountered: