Smart Waterloo Region Innovation Lab - Data Playground Framework

A modular network of networks enabling secure and anonymous data sharing using decentralized Ledger technology

Nov 01, 2022

Home

Data Playground Framework

Last Updated: June, 2022

Presented by

Krish Mehta – Lead Blockchain Architect Sankalp Narula – Blackhat Thinker Saba Oji – Data & Knowledge Lead

Introduction:**

In today’s world, the most under-appreciated commodity is data. The increasing urge towards a decentralized, hyper-locally connected world makes data the true asset of the future. Unfortunately, data is extremely centralized in the current technology infrastructure, and data creators are not its owners. We need a better system.

Problems we’re trying to solve:

Three big problems we’re trying to solve are:

  • Ownership – Data should never be centralized and everyone should be in complete control of their own data.
  • Exchange – Big Data Exchange platforms are not made to be discreet, or secure, and don’t have a universal source of truth.
  • Universal Hyper-local connectivity – True verified data is hyper-local, but the problem with communities is the sheer lack of data discipline.

Solution we propose:

The idea of the “Data Playground” Project is to create a replicable, modular network of networks that enables data sharing in the most secure and anonymous way using decentralized Ledger technology. The modularity of the framework will allow us to create an inter-connected data exchange platform, while multiple such networks will make data less inaccurate and more structured, starting right at the grass-roots level.

Layer of each modular network

To understand how data ownership and anonymity works, we need to understand the different layers of each such network. Figure 1 illustrates a general overview of each such network, so let’s deep-dive into these layers.

Figure 1 – Different layers of any modular network ![ref1]

User Layer (Decentralized Identities)

A decentralized or a Self-Sovereign Identity is basically your digital identity. This digital identity comprises of your credentials, which could either be verified by you or by the issuer of the credential. The nature of the data is then determined by who verifies[^1] that credential. These credentials are stored on your digital wallet, and the keys to access this data are stored using a Decentralized Key Management System (DKNS)[^2]. Using DKNS prevents the ability of only a central authority from always generating your credentials, and gives users the right to create and own their data. As a user, you then have the power to decide who you want to share your data with or revoke access.

Storage Layer (Decentralized Cloud Storage)

Once users have a digital identity, their data needs to be stored in a secure, accessible and retrievable fashion for which we use a Decentralized Cloud Storage. Here is where data ownership is achieved: user data is completely encrypted and only the user, through their digital identity, has the ability to decrypt and access it. The advantage of using such technology compared to other more common practices is that there is no single point of failure due to its decentralized nature. Data is fragmented and stored across multiple nodes[^3].

Blockchain Layer (Users, Chain, Smart Contracts, Validators)

This is the layer which includes validators, business logic (smart contracts), permissions, and transaction records. In a huge multi-party system, it is necessary to have one trustless source of truth. Using distributed ledger technology such as blockchains allows us to create such a source, while also helping manage the private relay and exchange of data. The business logic of who can verify and who can access data is all set up on-chain[^4] through this layer. Our expected framework to build this layer is going to be Hyperledger fabric, which follows something known as the Permissioned voting-based Consensus protocol[^5].

Data Request Simulation

To understand these different layers, let’s simulate a data request and the path data follows. The main stakeholders and associated layers of any one modular network: Users that interact with the Storage Layer (Green Arrows), and Organizations that not only run the Blockchain Layer(network) but also request data (Purple Arrows).

Figure 2 – Data Delivery Path

From Figure 2, let’s say any organization decides to start a network using our framework, and users of that organization/community join the network. Users store their data off- chain using decentralized cloud storage[^6]. It’s essential to understand that using such a solution does not limit the amount of data, while storing data on-chain is a less scalable, expensive option.

The way the data is stored is that it is completely encrypted with only the user having the ability to access their data. Moreover, a cryptographic hash of the data is also stored on- chain. This allows us to be able to use existing Protocols[^7] to maintain the integrity and existence of data over a period of time. Now, let’s say the organization creates a data request to all users of its network (validated and added as a new transaction on-chain). This data request would then be sent over to a permission smart contract that sends a request to the user asking if they want the organization to get access to their data. If allowed, the user decrypts their data and a Zero-Knowledge Proof-based[^8] response is sent over to the organization through a private “subnet”[^9]. A transaction record is then stored on-chain with the ability of only the user and the organization to be able to see private details of the transaction. We also develop the possibility to revoke organization access to data[^10], which when selected would close down the private data channel highlighted in yellow in Figure 2.

The reason why we call this a ‘Network of networks’ is because this is just one replicable network. There could be multiple such networks, and inter-network communication could then be facilitated through cross-chain bridges[^11], similar to how they are currently being used in decentralized finance. These cross-chain bridges will basically allow us to connect different networks whenever there is an inter-network data exchange that needs to take place, and will provide an added layer of anonymity for the users.

Now, let’s look at a couple of different networks and how they could be brought together with our framework.

Verified Data vs Unverified Data:

A good place to start with when we’re comparing the two main types of data is to take the analogy of a resume. The resume could include projects that an individual has completed in their own time to demonstrate proficiency in a particular software, and it could also include their certifications that they’ve achieved from certain recognized organizations. The certifications fall under the verified data category, whereas the projects would then be unverified data. For an interviewer, they would both be important to holistically judge a candidate. To finally understand how we deal with these different forms of data in a user-centric model, let’s take a few use-cases.

Possible Use-Cases:

Non-Verified Data Sharing

In today’s world, having a constant stream of real-time data is of actual value. The above few examples were of verified data. Now, we are going to consider a simpler situation that would require unverified data directly from the user.

Let's say a co-working space has the budget to build a new amenities room, but they’re not sure what all the businesses in the space required. In that case, the management could initiate a network request to conduct a survey across all the separate business networks, and all users within each of the networks could vote and provide their data input. This valuable data could then be used to build an amenities room that people need rather than want, allowing for the best utilization of resources.**

Verified Data Example:

Healthcare

Taking an example from the healthcare industry, if all your verified health-related records are stored on your decentralized cloud storage, your doctor, irrespective of where they are based out of, could directly request you to share those records, and you could decide to share that data only when needed with your doctor. This also allows for more consistency in records. For example, if you have an injury and you go to a walk-in clinic, the doctor could give a prescription and ask to consult a specialist. By having this trail of records, the specialist at the hospital could request data from you, and you could share the prescription given by the doctor at the walk-in clinic. No matter which hospital you go to, or network you join, the consistency of all your data could make healthcare more connected, and allow doctors to give faster diagnosis. Another large-scale use could be if the health ministry ever required data on understanding the growing trends of a certain disease, they could contact all hospital networks, who could then contact all their members to get data. With the right data, the health ministry could then allocate more money to increase research on the disease.

Survey

Organization X is gathering data on their employees and checking whether they have a valid driver’s license that is issued by the provincial government. The first step is for Organization X to request the data (driver’s license) from their employees. Through their wallet, the users receive a notification about the details of the request (which organization, what information is being requested). The wallet is the middle man between the cloud- based storage and the organization; it is where the data is encrypted and decrypted with access keys. The employees who have previously used their digital wallets to store their driver’s license data in an encrypted fashion in the cloud storage can then share the decrypted file with the organization.

There are two types of information an organization can ask for, these can be binary (yes/no; I.e., the organization only cares if they have the driver’s license or not), or non- binary (I.e., the organization needs their driver’s license number). If the organization requires binary information, a proof-based algorithm called Zero-Knowledge helps reach a binary conclusion without actually giving any information to the organization. Once the user decides to share the requested information, using blockchain technology, a private data channel is created between the user and the organization for data transfer purposes.

Sensitive Information Sharing

The City of Y is trying to collect information on the number of residents who are currently using the Supervised Injection Site services. To do this, they conducted a survey where they sent a request to all the residents’ wallets to share whether the individual has used any of the services within the past year. The collected data can be useful for funding and allocating resources to either create additional injection sites or provide additional affordable rehab/recovery options for residents. However, this information is very sensitive and some may not want to share it as it can be traced back to the individual themselves. To overcome this, the platform uses decentralized identities to provide anonymity to the user. Moreover, information of this sort (binary) is transferred as a simple yes/no to the city which provides complete anonymity. This means that the city only receives the number of individuals who answered yes and no, the city will not be able to trace it back to the individual in any way.

Transactions/Cross Chain

Let’s say an airport is a network in itself developed from our framework. When you purchase tickets using your digital identity, they get issued by the airline and stored on the decentralized cloud storage. A transaction record is maintained on the airport network’s public ledger, but no private transaction details are privy to members of the network. This data is verified data since a recognized airline has issued the ticket. You could then give consent to the machine at the boarding terminal to access your ticket, and board the airplane. This way, you were able to fly to a different city without sharing any personal information, and from a network’s perspective, no one can trace it back to you, providing you with a layer of anonymity.

Now, let’s say a bigger organization needs data on how many people have flown out of a certain region in the last week. In the current data model, the bigger organization would have just requested this data from the airports and the airlines. The problem is the bigger organization could end up with all your information without you even knowing about it. The data is yours and you should be able to decide who you want to share it with. Bringing in our new suggested model, the bigger organization could now request data from the airport network. Since all members flying through the airport are direct members of the network, they will receive a data share request. They could now see who is trying to get hold of their data and they could completely control what data should be shared in a completely private and anonymous way using their digital identity. Moreover, they could also choose to eventually revoke access to the bigger organization completely.

These use-cases just touch the tip of the iceberg. The possibilities are endless.

Conclusion and next steps:

We are currently ideating an all-network-wide consensus mechanism based on different levels of trust and quadratic voting[^12]. Basically, not all networks/organizations can be equally trusted, and such a mechanism would allow people to assign more trust to certain networks. It would also come to use especially when there is conflicting data across different networks.

In the future, we would also like to move into the tokenomics of things to assign actual value to data, and eventually even make data a tradable commodity.

As for the next steps, after peer review and community evaluation, we will be starting development using Hyperledger Fabric, and related frameworks. The project will be open- source, and anyone will be able to contribute to it. If you believe you can help, feel free to reach out to us at soji@regionofwaterloo.ca.

Our framework could be used with no definite bound. We’re hoping the community will come together to think of new ways to build on this. ![ref1]![ref1]![ref2]![ref1]![ref2] 9

[^1]: Please note that as we proceed, you may come across concepts like Verifying and Validating. These cannot be interchangeably used since both are mutually independent topics. Verifying data means that the data that is stored has been verified by a certain organization/issuer. An easy real-life way of thinking this would be the “blue” tickmarks on social media platforms, but for data. Validating Data, on the other hand, refers to the cryptographic process of verifying transactions on a blockchain network. The way a transaction is verified depends on the consensus protocol of the network. [^2]: rwot4-paris/dkms-decentralized-key-mgmt-system.md at master · WebOfTrustInfo/rwot4-paris · GitHub [^3]: IPFS Powers the Distributed Web : Decentralised Cloud Storage [^4]: On-chain basically refers to the main distributed public ledger. Off-chain, on the other hand, is the side-chain or the decentralized cloud storage where all the data is stored, and only the user has access to their data.
[^5]: The Ultimate Guide to Consensus in Hyperledger Fabric | Skcript [^6]: A decentralized storage network for humanity's most important information | Filecoin or Storj - Decentralized Cloud Storage [^7]: What sets it apart: Filecoin’s proof system [^8]: Zero Knowledge Proof: A Introductory Guide - 101 Blockchains [^9]: Channels — hyperledger-fabricdocs master documentation (hyperledgendary.github.io) [^10]: How Credential Revocation Works — Hyperledger Indy SDK documentation (hyperledger-indy.readthedocs.io) [^11]: What are cross-chain bridges and why do they matter? | CryptoSlate [^12]: Quadratic voting - Wikipedia [ref1]: Aspose.Words.c0559116-17b4-4332-8057-a4a72768a0f7.005.png [ref2]: Aspose.Words.c0559116-17b4-4332-8057-a4a72768a0f7.008.png [ref3]: Aspose.Words.c0559116-17b4-4332-8057-a4a72768a0f7.011.png