File data sources in Graph Node

High Level Description

A new "kind" of data source will be introduced: file/ipfs. These data sources can be statically defined, based on a content addressable hash, or they can be created dynamically by chain-based mappings.

Upon instantiation, Graph Node will try to fetch the corresponding data from IPFS, based on the Availability Chain's stated availability for the file, if it is configured. If the file is retrieved, a designated handler will be run. That handler will only be able to save entities specified for the data source on the manifest, and those entities will not be editable by main chain handlers. The availability of those entities to query will be determined by the Availability Chain, if configured.

Detailed Specification

Our running example will be a subgraph with the following store information:

type Store @entity {
  id: ID!
  owner: Bytes!
  block: BigInt!
  timestamp: BigInt!
  storage: String!
  productsCount: BigInt!
  productIds: [String!]!
  ordersCount: BigInt!
  products: [Product!] @derivedFrom(field: "store")
  vendors: [Bytes!]!
  collections: [Collection!] @derivedFrom(field: "store")
  collectionIds: [String!]!
  rating: BigInt!
  name: String
  slogan: String
  slug: String
  link: String
  description: String
  deliveryCountries: [DeliveryCountry!]!
}

In this case the name, slogan, slug, link and description are found in an IPFS file, while the ID and other info are sourced from Ethereum. This separation & merging at query time defined in the subgraph schema.graphql as follows:

type Store @entity {
  id: ID!
  owner: Bytes!
  block: BigInt!
  timestamp: BigInt!
  storage: String!
  productsCount: BigInt!
  productIds: [String!]!
  ordersCount: BigInt!
  products: [Product!] @derivedFrom(field: "store")
  vendors: [Bytes!]!
  collections: [Collection!] @derivedFrom(field: "store")
  collectionIds: [String!]!
  rating: BigInt!
  metadata: StoreMetadata
  deliveryCountries: [DeliveryCountry!]!
}

type StoreMetadata @entity {
  id: ID!
  name: String
  slogan: String
  slug: String
  link: String
  description: String
  storage: String
  entityId: String
}

We now have an entity which is only dependent on IPFS (StoreMetadata). This can therefore be considered separately for Proof-of-indexing.

A static file data source is then declared as:

- name: StoreMetadataTemplate
  kind: file/ipfs
  mapping:
    apiVersion: 0.0.7
    language: wasm/assemblyscript
    file: ./mappings/metadataHandlers/StoreMetadataHandler.ts
    handler: handleStoreMetadata
    entities:
      - StoreMetadata
    abis:
      - name: Melcor
        file: ./abis/Melcor.json

In this case ipfs is the type of file, indicating that this file is to be found on the IPFS network.
The entities specified under the mapping are important in guaranteeing isolation - entities specified under file data sources should not be accessible by other data sources (and the file data source itself should only create those entities). This should be checked at compile time, and should also break in run-time if the store API is used to update a file data source.
There can only be one handler per file data source.

In our example, the data source would not be static but a template, in which case the source field would be omitted. To instantiate the template the current create API can be used as:

export function SetStoreHandler(event: SetStore): void {
  const id = event.params.storeId.toString();
  let store = Store.load(id);
  if (store === null) {
    store = new Store(id);

    store.owner = event.params.owner;
    store.vendors = [event.params.owner];
    store.productsCount = zeroBigInt;
    store.ordersCount = zeroBigInt;
    store.block = event.block.number;
    store.rating = zeroBigInt;
    store.timestamp = event.block.timestamp;
    store.productIds = [];
    store.collectionIds = [];
  }

  const countries: string[] = [];
  for (let i = 0; i < event.params.deliveryCountries.length; i++) {
    const country = event.params.deliveryCountries[i].toString();
    countries.push(country);
  }
  store.deliveryCountries = countries;

  store.storage = event.params.metadata;
  const metadata = store.storage + "/metadata.json";
  store.metadata = metadata;

  const context = new DataSourceContext();
  context.setString("entityId", id);
  context.setString("storage", store.storage);
  StoreMetadataTemplate.createWithContext(metadata, context);
  store.save();

  let user = User.load(event.params.owner.toHex());
  if (user === null) {
    user = createUser(
      event.params.owner,
      event.block.number,
      event.block.timestamp,
      zeroBigInt
    );
  }
  user.save();
}

The file handler would look like:

export function handleStoreMetadata(content: Bytes): void {
  const storeMetadata = new StoreMetadata(dataSource.stringParam());
  const context = dataSource.context();
  const entityId = context.getString("entityId");
  const storage = context.getString("storage");
  const try_value = json.try_fromBytes(content);

  storeMetadata.entityId = entityId.toString();
  storeMetadata.storage = storage;
  if (try_value.isOk) {
    const value = try_value.value;

    if (value.kind == JSONValueKind.OBJECT) {
      const jsonData = value.toObject();
      const name = jsonData.get("name");
      const description = jsonData.get("description");
      const slogan = jsonData.get("slogan");
      const slug = jsonData.get("slug");
      const link = jsonData.get("link");

      if (name) {
        storeMetadata.name = name.toString();
      }

      if (slogan) {
        storeMetadata.slogan = slogan.toString();
      }

      if (description) {
        storeMetadata.description = description.toString();
      }

      if (slug) {
        storeMetadata.slug = slug.toString();
      }

      if (link) {
        storeMetadata.link = link.toString();
      }
    }

    storeMetadata.save();
  }
}

Note that in this case, the mapping of the Ethereum-based entity to the IPFS-based entity takes place entirely in the Ethereum data mapping, which allows for multiple Ethereum entities to reference the same file-based entity.

It is possible for an identical file data source to be created more than once (e.g. if multiple ERC721s share the same tokenURI on-chain). In this case the corresponding file handler should only be run once.

Indexing IPFS data sources

Indexing behaviour will be dependent on whether Graph Node has been configured with an Availability Chain.

Without an Availability Chain

In the absence of an Availability Chain, when a file data source is created, Graph Node tries to find that file from the node's configured IPFS gateway. If Graph Node is unable to find the file, it should retry several times, backing off over time. On finding a file, Graph Node will execute the associated handler, updating the store. The associated entity updates will not be part of the subgraph PoI.

With an Availability Chain

The initial implementation in Graph Node will not include an Availability Chain

If Graph Node has an Availability Chain configured, when a file data source is created, Graph Node should check the availability of the file in the Availability Chain, and in its configured IPFS Gateway.

If the file is marked as Available by the Availability Chain per the latest block, and the Graph Node is able to find that file, it should process the corresponding handler and update the PoI for updated entities with that latest Availability Chain block. It should then listen for updates to that file's availability, and if the file is marked as unavailable, the resulting entities' availability block range should be closed as of the latest block, and the PoI updated.

If the file is marked as Available by the Availability Chain per the latest block, and the Graph Node is not able to find that file, it should indicate to the Availability Chain that it is not able to find the file, and try again to find the file periodically.

If the file is not tracked by the Availability Chain, or marked as Unavailable, and the Graph Node is able to find the file, it should indicate to the Availability Chain that the file is available. Notably, Graph Node should not proceed with the corresponding handler until the Availability Chain marks the file as Available.

If the file is not tracked by the Availability Chain, or marked as Unavailable, and the Graph Node is not able to find the file, it should indicate to the Availability Chain that the file is not available, and try again to find the file periodically.

Interacting with the store

File data source mappings can only load entities from chain data source entities, up to the file data sources chain create block.
Chain data source entities cannot interact with file data source entities in any way.

Querying subgraphs using file-based data sources

At Query time, Graph Node should be aware of whether the query is requesting data which includes file data source entities.

In our example:

## No file-based data source entities:
fragment StoreInfo on Store {
  id
  storage
  owner
  block
  timestamp
  productsCount
  ordersCount
  rating
  name
  description
  deliveryCountries {
    id
    name
  }
}

## Including data from file-based entities:
fragment StoreInfo on Store {
  id
  storage
  owner
  block
  timestamp
  productsCount
  ordersCount
  rating
  metadata {
    name
    description
  }
  deliveryCountries {
    id
    metadata {
      name
    }
  }
}

query getStoreById($id: ID!, $tokens: [String!]) {
  store(id: $id) {
    ...StoreInfo
  }
}

If a query requires data from a file data source, and Graph Node has an Availability Chain configured, the query will need to provide an Availability Chain block hash in addition to the Ethereum block hash.
If no Availability Chain block hash is provided, the default will be the latest that Graph Node is aware of, similar to current treatment of Ethereum block hashes in Graph Node. Note that such a query is not deterministic.
Graph Node may not be able to support all Availability Chain blocks, based on when it initiated indexing, and should refuse to serve queries if the stated Availability Chain is out of range.

File data sources in Graph Node

High Level Description

Detailed Specification

Indexing IPFS data sources​

Without an Availability Chain​

With an Availability Chain​

Interacting with the store​

Querying subgraphs using file-based data sources​

Indexing IPFS data sources

Without an Availability Chain

With an Availability Chain

Interacting with the store

Querying subgraphs using file-based data sources