Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BDP data sets show up on OCEAN Acquarius instance #643

Closed
TimDaub opened this issue Jun 4, 2021 · 17 comments
Closed

BDP data sets show up on OCEAN Acquarius instance #643

TimDaub opened this issue Jun 4, 2021 · 17 comments
Labels
Status: WontFix This will not be worked on Type: Question Further information is requested

Comments

@TimDaub
Copy link

TimDaub commented Jun 4, 2021

Hello,

for previous launches over at https://market.bigdataprotocol.com, those assets didn't show up on https://market.oceanprotocol.com. That's an assumption e.g. the crawler of rugpullindex.com was built on.
Since last night, however, VORSTA-2 is showing up both on BDP's market and OCEAN's.

For the crawler this is an issue, as I have UNIQUE constraints in my database on the DID. I'll do a workaround for now, but I'd like to know what perspective the Ocean team has. Is this a bug? Or will BDP data sets show up on the Ocean marketplace in the future? But then, why don't other BDP data sets show up yet?

@TimDaub TimDaub changed the title BDP data sets show up on ocean acquarius instance BDP data sets show up on OCEAN Acquarius instance Jun 4, 2021
@kremalicious
Copy link
Contributor

kremalicious commented Jun 4, 2021

This is up to the marketplace owner, BDP decided to isolate their assets without forking the contracts, for which the DDO gets encrypted upon publishing. So on first look it seems the encrypt flag was not used for this dataset so it will be picked up by any aquarius instance, and not just the BDP aquarius instance. And the pool/price is not showing up because our subgraph excludes all pools without OCEAN (as of right now).

@TimDaub
Copy link
Author

TimDaub commented Jun 4, 2021

OK, but what I'd like to understand is the perspective here. What is Aquarius? And what is OP planning to do with it?

When calling Aquarius with an encrypted BDP data set GET | https://aquarius.mainnet.oceanprotocol.com/api/v1/aquarius/assets/ddo/did:op:94EF613E8c91e372347da00fac486ed787e84B5d, on the UI the market answers with:

Could not retrieve asset

[asset] The DDO for did:op:94EF613E8c91e372347da00fac486ed787e84B5d was not found in MetadataCache. If you just published a new data set, wait some seconds and refresh this page.

The Aquarius request says:

did:op:94EF613E8c91e372347da00fac486ed787e84B5d asset DID is not in OceanDB

What is the MetadataCache? Is it another name for Aquarius? If so, isn't Aquarius the sole storage for Ocean DDOs as e.g. a publish transaction only submits the DDO's hash (also called DID?) on-chain.

I quote you @kremalicious here:

If Aquarius can be easily made into showing correct numbers, we might release a new version, but in general we are moving away from having price info in the DDO completely. It was always a workaround, never standardized, and will not fit our multi-pool future anyway, so it was decided to scale back Aquarius to what it was supposed to be, a cache for the metadata standardized in our DDO schema.

In case though, the DDO is solely stored on Aquarius, I think it's quite important that any party interacting with "the ocean protocol" has the equal quality of access to its data, no? It's not a cache. It's the source of data. After all, I couldn't launch my own Aquarius instance to track OCEAN assets, right?

I know that argument is now off-topic for oceanprotocl/market but I think the statement is anyways a valid concern.

@kremalicious
Copy link
Contributor

kremalicious commented Jun 4, 2021

haha OceanDB, all those legacy names floating around across the stack. Aquarius is a metadata cache so yes Aquarius === MetadataCache, in code we usually refer to it as metadataCache to make its use more clear.

Aquarius watches the metadata contract events where the DDO is stored on-chain, and puts them in its database. You could remove Aquarius, and query the metadata contract directly from the client for various events but of course with lots of performance implications.

From the Aquarius readme:

Aquarius runs an events monitor that watches the:

  • MetadataCreated event from the Metadata smartcontract
  • Reads the events data argument, decompresses the metadata json object then runs schema validation before saving it to the database

So the source of truth for the DDO is that data argument on-chain. That Data gibberish is the DDO. (Think that readme is not complete, as also MetadataUpdate event is watched, @calina-c or @DMATS or @alexcos20 know more about exact details)

In a perfect decentralized world Aquarius should not exist, but the best we have for a decentralized blockchain indexer is The Graph. Yet there were some blockers for using it for the metadata contract so we are left with Aquarius still. And aquarius.mainnet.oceanprotocol.com is simply run by OPF, but other marketplaces have chosen to run their own instances/forks.

In general, the market UI displays all assets published with our metadata contract. So by default assets published through different UIs will show up in all those UIs. But marketplaces can choose to encrypt that DDO with their own key upon publishing which only their Aquarius has, so only their Aquarius can decrypt the DDO. That is for use cases where forking and deploying contracts might not be desired.

So going back to the dataset in question, BDP choose to define for their marketplace, which is a fork of the UI and Aquarius only:

  • BDP data set === data set published into the same metadata contract, but with encrypted DDO/own Aquarius

And did:op:177311f057Bc9B56165947F7465E0E239024FD2d was simply not encrypted, so all UIs will show it because Aquarius picks it up. Interestingly enough, the publisher could now choose to create an OCEAN pool here in the general market.

As for the assumption for your crawler, it is technically completely right. But now needs to account for what might have been user error or some bug. Either assuming it won't happen again and only deal with this asset, or cross-reference between the 2 different Aquarius instances.

@TimDaub
Copy link
Author

TimDaub commented Jun 4, 2021

So the source of truth for the DDO is that data argument on-chain. That Data gibberish is the DDO. (Think that readme is not complete, as also MetadataUpdate event is watched,

ah ok. I think it's great that you're putting the DDO on-chain. Since we're already on the topic and since I'm lazy: Is the message sender of a metadata contract transaction always the data set publisher via e.g. Metamask? I'm asking as another option could be that e.g. the Ocean market sends this data periodically to reduce gas.

Anyways, since DDO is on-chain I agree that Aquarius is a cache then.

With regards to encryption:

So by default assets published through different UIs will show up in all those UIs. But marketplaces can choose to encrypt that DDO with their own key upon publishing which only their Aquarius has, so only their Aquarius can decrypt the DDO

Do you mean like actual encryption, e.g. that other Aquarius instances cannot decypher what others have put as a DDO? And do I understand correctly that this encryption is set up between a user A and an Aquarius instance X (private key holder)?

If so, why did you end up using encryption? To me, determining the responsibility of which Aquarius instance is supposed to index what DDO, could also be flagged otherwise (e.g. by adding a marketplace identifier (DDO: of "OCEAN protocol"). I assume all of this is not standardized. Anyways, if e.g. it should be the case that encryption was used to introduce a hard constraint for particular Aquarius instances to disallow crawling meta data, I think a flag should be favored in the future. If privacy of communication between Aquarius and a user is indeed of a concern, encryption makes sense.

As for the assumption for your crawler, it is technically completely right. But now needs to account for what might have been user error. Either assuming it won't happen again

This sounds rather unrealistic to me. If the metadata contract allows adding data points permissionless, I'll have to assume that users will put all sorts of things in there. As for now, I've taken the above-outlined problem as a case of precedent to adjust my crawler.

or cross-reference between the 2 different Aquarius instances.

A work around for now could be to favor the DDO that includes price data...

@TimDaub
Copy link
Author

TimDaub commented Jun 4, 2021

Another thought: Why do unencrypted DDOs show up on the Ocean Marketplace? What's the reason for taking that approach. I'm curious as in the situation where a third party wanted to also use the encryption logic with this hub and spokes model that was mentioned here, it wouldn't work as implicitly "unencrypted = ocean protocol market picks up meta data"?

@yappy876
Copy link

yappy876 commented Jun 4, 2021

We had the same issue, noticed the dataset was on both BDP and OCEAN causing all problems.

@TimDaub
Copy link
Author

TimDaub commented Jun 4, 2021

@yappy876 is doing https://oystershell.io/

@yappy876
Copy link

yappy876 commented Jun 4, 2021

@yappy876 is doing https://oystershell.io/

Yes thank you for that. Forgot to mention...

We are doing a work around in the meanwhile. I also spoke to Peter Chen from BDP about the issue, so they are informed.

@jerryCide
Copy link

Another thought: Why do unencrypted DDOs show up on the Ocean Marketplace? What's the reason for taking that approach. I'm curious as in the situation where a third party wanted to also use the encryption logic with this hub and spokes model that was mentioned here, it wouldn't work as implicitly "unencrypted = ocean protocol market picks up meta data"?

We (OysterShell) are curious to hear more about this. In addition, if this is being allowed going forward it would be of value to see what are the policies surrounding this so we can code appropriately

@alexcos20
Copy link
Member

As a market owner, you have 2 choices:

  • publish unencrypted DDO -> any aquarius will cache that asset -> you will get network effect, because many markets will sell that asset
  • publish encrypted DDO -> only aquarius instances that have the private key will cache that asset -> you will get to control which assets are displayed by your market, but you will loose the network effect

An aquarius instance can be configured to control what kind of DDO can cache. So, again, you have the following options:

  • allow all assets -> this includes unencrypted DDO + encrypted DDO with aquarius's private key
  • allow only encrypted assets -> your aquarius will cache only encrypted DDO with aquarius's private key, and it will NOT cache unencrypted ones

@alexcos20
Copy link
Member

alexcos20 commented Jun 7, 2021

Regarding BDP dataset, the following flow happend:

  • they publish the DDO, using the encrypted flag
{ address: '0x1a4b70d8c9DcA47cD6D0Fb3c52BB8634CA1C0Fdf',
   blockHash:
    '0xc64846f16b007c3fe831fae8a61ad0f6bf9bace9e6523ac56a9f2497e0a15218',
   blockNumber: 12561700,
   logIndex: 123,
   removed: false,
   transactionHash:
    '0xdfddd2b7885ce332824034fc626e189a5c6672a26b61170cf2eed114e03ed021',
   transactionIndex: 75,
   event: 'MetadataCreated',
   id: 'log_482fc557',
   returnValues:
    Result {
      '0': '0x177311f057Bc9B56165947F7465E0E239024FD2d',
      '1': '0x89717015882D6460e4A0daeB945B3D4032f2D9D6',
      '2': '0x02',
      '3':

the 2nd parameter in event (called flags) is 0x2 -> means that the DDO is encrypted

All good, but they wanted to edit the metadata:

{ address: '0x1a4b70d8c9DcA47cD6D0Fb3c52BB8634CA1C0Fdf',
    blockHash:
     '0xa1ad4f646ced2ca7929160d3d8f1ae9d14bb7a169e9131aaa8aba48973c41f73',
    blockNumber: 12561735,
    logIndex: 95,
    removed: false,
    transactionHash:
     '0x155805c26acaaea2f8e4008fa99d8c918d5add8905fe27e63e0cf80774e38eca',
    transactionIndex: 68,
   event: 'MetadataUpdated',
    id: 'log_5da6a2eb',
    returnValues:
     Result {
       '0': '0x177311f057Bc9B56165947F7465E0E239024FD2d',
       '1': '0x89717015882D6460e4A0daeB945B3D4032f2D9D6',
       '2': '0x01',

but they forgot to encrypt (flags = 0x1 , means DDO is just lzma compressed, no encryption).
Of course, any aquarius instance configured to cache all assets would pick that up and cache it

@alexcos20
Copy link
Member

ah ok. I think it's great that you're putting the DDO on-chain. Since we're already on the topic and since I'm lazy: Is the message sender of a metadata contract transaction always the data set publisher via e.g. Metamask? I'm asking as another option could be that e.g. the Ocean market sends this data periodically to reduce gas.

the publisher calls the metadata contract via Metamask. And there is only one simple check: the publisher has to be the minter of the datatoken: https://github.com/oceanprotocol/contracts/blob/main/contracts/metadata/Metadata.sol#L32 for both create & update methods. Thus, we are preventing unauthorized access

@alexcos20
Copy link
Member

So the source of truth for the DDO is that data argument on-chain. That Data gibberish is the DDO. (Think that readme is not complete, as also MetadataUpdate event is watched,

ah ok. I think it's great that you're putting the DDO on-chain. Since we're already on the topic and since I'm lazy: Is the

If so, why did you end up using encryption? To me, determining the responsibility of which Aquarius instance is supposed to index what DDO, could also be flagged otherwise (e.g. by adding a marketplace identifier (DDO: of "OCEAN protocol"). I assume all of this is not standardized. Anyways, if e.g. it should be the case that encryption was used to introduce a hard constraint for particular Aquarius instances to disallow crawling meta data, I think a flag should be favored in the future. If privacy of communication between Aquarius and a user is indeed of a concern, encryption makes sense.

Imagine that you DO want your siloed marketplace. And instead of using encryption, you will use a flag: marketplace = "Tim".
What is going to stop me to add that flag in my ddo, publish it using my ocean.js/py and your marketplace will pick that asset and make it available?

The flow right now is the following:

  • prepare your DDO
  • call your aquarius and ask it: this is my DDO, please encrypt it (as a publisher, you will never get the private key). Aquarius will return the encrypted DDO
  • you publish the encrypted DDO

Of course, there is always the question: how can I protect my aquarius so others cannot use that endpoint? Simple: you can firewall your aquarius (makes sense for a siloed marketplace), or you can protect the encrypt endoint (nginx, etc)

@alexcos20 alexcos20 added Status: WontFix This will not be worked on Type: Question Further information is requested labels Jun 7, 2021
@TimDaub
Copy link
Author

TimDaub commented Jun 8, 2021

@alexcos20, thank you for such a detailed response. I appreciate it.

Now, I found your arguments with regard to the encryption strategy quite interesting. Here are the facts we were able to gather so far:

  • All DDOs are stored on-chain on a MetadataContract
  • Aquarius is framed as a cache that reads on-chain data and makes them available via a HTTP API
  • Aquarius has an encryption feature that allows a publisher ask for the on-chain data to be encrypted through Aquarius.
  • The reason for not implementing a "belongs-to-marketplace-x" flag is that forbidding other marketplaces to crawl the Metadata anyways cannot be enforced without encrypting the data.
  • Ocean Protocol's marketplace Aquarius indexes any Metadata as long as it's not encrypted
  • rugpullindex.com and oystershell have had bugs in their systems as they weren't aware of the properties of this mechanism
  • The Aquarius Metadata contract mechanism isn't standardized

Now, to the assumptions that may have lead to this design:

Imagine that you DO want your siloed marketplace.
Of course, there is always the question: how can I protect my Aquarius so others cannot use that endpoint? Simple: you can firewall your Aquarius (makes sense for a siloed marketplace), or you can protect the encrypt endpoint (nginx, etc)

As someone that only has an external view of Ocean Protocol's development, these assumptions don't make sense to me. I haven't talked to the Big Data Protocol guys, but I don't believe that they are necessarily interested in encrypting their on-chain data. You've called it a "silo'ed" marketplace and I don't believe that this is their design intention. From their actions, they clearly want to distinguish themselves from OP. They have a different web design, community, etc. But IMO firewalling their Aquarius instance or firewalling access to the metadata of their data sets makes to me no economic sense. It's in BDP's interest to get as many eyes on their data sets as possible to find potential investors in their data tokens. It'd make no sense, hence, for them to lock down the metadata of any data set.

I understand that there may be users of the OP marketplace code that I'm not aware of. But for public marketplaces, I'm failing to see a rational reason for encrypting metadata. In any case, given the current design that uses XHR requests from a browser to Aquarius, I anyways don't see a solid technical design for locking down an API. Unless you want to go full Twitter...

Why am I making such a fuss about it? I do it, as I think this is a flawed implementation. As a heavy Aquarius user, I appreciate that someone is caching on-chain data. My long-term goal, however, is to observe the data closest to its source - that is on-chain.

I'm happy to see that all metadata can be read from an on-chain contract and I'm planning to take advantage of this as soon as possible. But here are my concerns:

  • I see the Metadata contracts as a core element of the Ocean protocol. As I'd rely on them as an API, I hence expect them to be standardized as a mechanism, e.g. by OEP or through some already existing EIP.
  • I find encrypting meta data for data tokens to be a harmful approach for a speculative issue. After all, you didn't mention a case where a marketplace illegally crawled another one's metadata. I don't find your argument in this regard convincing and I'd only found the encryption solution compelling if we had precedence of this problem
  • From rugpullindex.com's crawler perspective, I'm a bit anxious now. People from OP have said that Aquarius will undergo changes. To me, it's unclear where it's headed. Similarly, the same uncertainty is now true for the Metadata contracts. Shall I rely on them or will everyone start to encrypt them?
  • For a marketplace operator that has control over an Aquarius instance, it may make sense to classify Aquarius as a "cache". However, if all data sets as in BDP's case are encrypted on-chain then for a user like rugpullindex that has no access to this encrypted on-chain data, Aquarius is NOT a cache. It's the only possible source of information and hence a STORE.
  • Finally, even if we at RPI would choose to now index on-chain data ourselves. How'd we be able to index the data sets from BDP? In DM's BDP leadership has already said that they like what we're doing with rugpullindex and that they're interested in supporting us. I can, however, not expect from them to share with me their key for their Aquarius instance.

So now what should I do? Read on-chain contracts and for BDP and other marketplaces, hope that their Aquarius remains public?

To me, a new feature seems appropriate here: Introduce an affiliation flag for marketplaces that want to reserve the ability to display the metadata on their marketplace while allowing others to see it on-chain.

@TimDaub
Copy link
Author

TimDaub commented Jun 8, 2021

Re: GitHub labels: This is not a type: question. This is a bug. My assumptions/expectations as a user have been broken.

@alexcos20
Copy link
Member

As someone that only has an external view of Ocean Protocol's development, these assumptions don't make sense to me. I haven't talked to the Big Data Protocol guys, but I don't believe that they are necessarily interested in encrypting their on-chain data. You've called it a "silo'ed" marketplace and I don't believe that this is their design intention. From their actions, they clearly want to distinguish themselves from OP. They have a different web design, community, etc. But IMO firewalling their Aquarius instance or firewalling access to the metadata of their data sets makes to me no economic sense. It's in BDP's interest to get as many eyes on their data sets as possible to find potential investors in their data tokens. It'd make no sense, hence, for them to lock down the metadata of any data set.

My bad, I forgot to mention the 3rd option: Aquarius has a ENV var which specifies the allowed list of publishers.
And this is what BDP is using: encryption (so their assets are not visible in other marketplaces) + list of publishers (all their datasets are published by one address). Their Aquarius is public, you can query it.

I understand that there may be users of the OP marketplace code that I'm not aware of. But for public marketplaces, I'm failing to see a rational reason for encrypting metadata. In any case, given the current design that uses XHR requests from a browser to Aquarius, I anyways don't see a solid technical design for locking down an API. Unless you want to go full Twitter...

Agreed, the keyword here is "public marketplaces"

Why am I making such a fuss about it? I do it, as I think this is a flawed implementation. As a heavy Aquarius user, I appreciate that someone is caching on-chain data. My long-term goal, however, is to observe the data closest to its source - that is on-chain.

I'm happy to see that all metadata can be read from an on-chain contract and I'm planning to take advantage of this as soon as possible.

I can point you to Aqua or help you with that in nodejs

But here are my concerns:

  • I see the Metadata contracts as a core element of the Ocean protocol. As I'd rely on them as an API, I hence expect them to be standardized as a mechanism, e.g. by OEP or through some already existing EIP.
  • I find encrypting meta data for data tokens to be a harmful approach for a speculative issue. After all, you didn't mention a case where a marketplace illegally crawled another one's metadata. I don't find your argument in this regard convincing and I'd only found the encryption solution compelling if we had precedence of this problem
  • From rugpullindex.com's crawler perspective, I'm a bit anxious now. People from OP have said that Aquarius will undergo changes. To me, it's unclear where it's headed. Similarly, the same uncertainty is now true for the Metadata contracts. Shall I rely on them or will everyone start to encrypt them?

It's up to the market owner. We have many use cases: public markets, markets that are controlling the list of publishers, siloed markets and private markets.

  • For a marketplace operator that has control over an Aquarius instance, it may make sense to classify Aquarius as a "cache". However, if all data sets as in BDP's case are encrypted on-chain then for a user like rugpullindex that has no access to this encrypted on-chain data, Aquarius is NOT a cache. It's the only possible source of information and hence a STORE.

Yes, this is true. Again, the market owner decides this

  • Finally, even if we at RPI would choose to now index on-chain data ourselves. How'd we be able to index the data sets from BDP? In DM's BDP leadership has already said that they like what we're doing with rugpullindex and that they're interested in supporting us. I can, however, not expect from them to share with me their key for their Aquarius instance.

That the point. You cannot do that.

So now what should I do? Read on-chain contracts and for BDP and other marketplaces, hope that their Aquarius remains public?

To me, a new feature seems appropriate here: Introduce an affiliation flag for marketplaces that want to reserve the ability to display the metadata on their marketplace while allowing others to see it on-chain.

But there are use cases where they DO NOT want others to see the data.

@alexcos20
Copy link
Member

alexcos20 commented Jun 8, 2021

  • From rugpullindex.com's crawler perspective, I'm a bit anxious now. People from OP have said that Aquarius will undergo changes. To me, it's unclear where it's headed. Similarly, the same uncertainty is now true for the Metadata contracts. Shall I rely on them or will everyone start to encrypt them?

We will deploy the next version soon. I can give a brief list of changes:

  • price object is removed from ddo. Why? Because we already have all this in the graph, and there is no need to maintain 2 pieces of code, in different languages.
  • multi network: instead of caching one chain, you can have your Aquarius instance monitor several chains in the same time. Two major advantages:
  1. market can display all assets in the same time, you don't need to change network anymore(you have to for consumption)
  2. simplify devops. Multiple networks -> only one instance deployed

That's the feature list.

No changes to the metadata contract.

@TimDaub TimDaub mentioned this issue Jan 19, 2022
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: WontFix This will not be worked on Type: Question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants