(describes the design of the components and their relationships)
Dataswift's platform is multilayered in its architecture. The stack is comprised of a technological structure with a further legal structure sitting atop, enabling an economic structure to be formed.
Dataswift currently does not have an avenue for an application developer to manage all the PDAs. The onus is on the application developers to track all the PDAs that are provisioned via their application(s) or PDAs that have signed on to their application(s).
However, PDA management functionality is slated for 2021. The most common data management functionalities would be available via “Contracted PDAs” APIs. Read about “Contracted PDAs” here.
For the PDA owners, they can manage their PDA by accessing their personal URL.
PDAs are private by default, as it sits within an individual’s personal data server with an individual’s own database and only they have the credentials to access it. All the data within a personal data account are legally owned by the PDA owner and form part of the person’s property rights.
However, when a PDA owner chooses to share their data with websites and applications, both the PDA owner and the application enter into a contract where the PDA owner licenses a bundle of data to the websites/applications for a duration and purpose. Such a contract must be pre approved by Dataswift governance department and contracts set up through a review process before the exchange can be enabled. If the data includes personal identifying information, the PDA owner would not be private. If it does not, the PDA may be able to stay private (this depends on context and data types).
PDA owners can allow websites and applications to _use the PDAs _within the database to store data generated by these websites and applications in the PDA. These accounts must also be set up by Dataswift together with the contract for first party create/read/update/delete (CRUD) access, second party READ or third party Create/Update/Delete entered into by both parties.
Since a PDA sits within a server database, it can be an identity authenticator without the application holding identity information. Some applications choose not to store any identifying information in the app’s PDA namespace, so as to be identity blind. Some applications choose to store identifying information, because it’s part of their service (e.g. loyalty apps). Hence, an individual PDA owner could have PDAs with data from applications that identify them, side by side with applications that do not know who their users are. To know the practices of applications that use PDAs, Dataswift has a rating system that enable individuals to know how websites and applications treat their data in the PDA. It is up to individuals to choose if they wish to install and engage with any particular app when they see the ratings.
A special type of PDA, called contracted PDAs, enable individuals to give permission for additional credentials to be given to access to the PDA. This is often used by when both an individual and an organisation wish to have independent access to the PDA namespace for mutual benefit e.g. student records or employee records. If a PDA has no identifying information, the data can also be retrieved by an organisation for the purpose of aggregated analytics, preserving the privacy of individuals. The contract to set this up is also subject to review by Dataswift governance.
Since the PDA owner owns the PDA and legally licenses the data and/or space to the application, the PDA owner has a legal recourse towards the owner of application should they use the data inappropriately or breach the contract terms of usage. Similarly, if Dataswift or our regulator received a complaint on potential misuse, access may be suspended.
*Dataswift would keep the email of the PDA owner for account creation, security and system notification purposes but Dataswift has no access to any data in PDAs.
Identity and user data storage is the simplest. Instead of building up your own backend, you can use PDAs as your user identity and data storage. The way to integrate PDAs can be seen in "User Journeys integrated with PDAs".
Depending on the complexity of your application, it will fall into one of these design patterns.
Building on decentralised Personal Data Accounts is a paradigm shift, not just technically, but also legally, economically and ethically. You should get advice on figuring out which of the above design patterns would suit you best as well as how to build your application so that the way your application handles personal data is ethical, responsible and privacy preserving.
The HAT is a “HAT Microserver” - a unique personal data server that grants user legal rights over its contents. The HAT Microserver is one of the underpinning technologies for Dataswift’s Personal Data Account technology. The HAT Microserver confer data rights. Dataswift's Personal Data Accounts issued for websites and applications are for data rights, mobility and control by individuals.
Read here about the HAT Microserver.
The word ”HAT” (abbreviation of Hub-of-All-Things) is not used on its own. It is followed by another word that would give it meaning. As a result of its history, the word “HAT” has been used synonymously with the word PDA (Personal Data Account). This is not technically correct.
The name of the personal data server within Dataswift’s proprietary Dataswift One Platform that gives individuals data rights, built up from the open sourced Hub of All Things technical code. This is not to be confused with a Personal Data Account (PDA) which is the account given to individuals by Dataswift through the Dataswift One platform, when requested by websites and applications, within a Personal Data Server. A Personal Data Account (PDA) is a service that provide data rights, data mobility and data control.
The database within the HAT Microserver where an individual’s personal data is kept.
Data inside a HAT database.
Available at https://github.com/Hub-of-all-Things/. This not to be confused with HAT Microserver, which is built on the open source software, but is deployed in the cloud and transformed by Dataswift’s HATDeX platform into a legal, economic and technical artefact that is able to be legally owned by a person. The HAT open source software could be deployed for use for any entity such a person, an organisation or a thing. However the HAT Microserver can only be owned by a person through Dataswift’s proprietary IP.
The regulator of the Dataswift One platform and the guardian shareholder of Dataswift.
A product/service from Dataswift that support research by universities using PDAs
A framework of data rights that provides the guiding principles to enable HCF and the Ethics Board to establish, operate and continuously improve the regulatory governance system executed by Dataswift.
The original 6 university, £1.2m RCUK research project led by Professor Irene Ng in 2013 that created the concept of personal data rights, mobility and control.
The user owns the database and all the namespace within. To be able to read/write into the user's database requires a contract between the app and the user. The HMI Screen/contract (HMIC) allows the app owner to give permissions to the application owner.
Technically speaking, you have multiple PDAs sitting in one Personal Data Server (PDS), each of which has a namespace.
When a user signs on the the platform, a Personal Data Server is provisioned for that user. Plainly speaking, it's 1 email address to 1 Personal Data Server (a 1-1 relationship). The PDS is called such because it has a Storage system, which an individual owns. And, you have associated software APIs, and HMI contracts for you to control / permit Data Access. And, lesser known Tools / SHE functions to perform private computation on your data.
Every external party establishes a data relationship with the PDS. In this relationship, the location where data is exchanged is formalised. This is known as the
namespace. Every such relationship constitutes a Personal Data Account. In short, PDA encapsulates
namespaceand the relationship with 1 external party. Hence, 1 email-person -> 1 PDS ->
A PDA issued from an applicafion or website uses the PDS as an identity service. PDS can be used as an identity solution as it has the necessary security qualifications to authenticate the data in the PDA. However, if it is used for identity, the service provider has to choose what data within the PDA that they want to verify against. Identity that is bounded to a real person is therefore a verification of data from another source that has come from another place e.g. a bank or Twitter. The PDA is useful for the original holder of the source of truth to place, within the PDA, a piece of information (library account holder 12345) that can verify that source of truth.
Dataswift's platform and PDAs are hosted on AWS in Ireland/eu-west-1 for PDAs issued by apps in Europe and on AWS in USA for PDAs issued by Apps in USA.
Data at rest within the PDA is stored in two forms:
- Files are stored in AWS S3 Key-Value Store.
- Data is stored in AWS Relational Data Store (RDS) Database Servers.
If you sign up for a PDA in the US, your PDA will be from the AWS US region and similarly if you signed up from Europe, it will be stored in the Europe region.
NB: Data storage regulation regulates data controllers that are organisations. The PDA data controller is the individuals themselves so are out of scope of data regulations including HIPAA, Schrems II etc.
Dataswift enables apps to interact with the personal data accounts of individual users. It provides the technical, legal, and commercial infrastructure that means data can flow between the PDAs and the applications that are built to use them.
So the personal data accounts, and Dataswift's infrastructure are - as far as app developers are concerned - back end tools. It is up to app developers to create a good user interface.
The only area where Dataswift gets involved with user interfaces is in providing special screens which enable the PDA owner to give permission to the apps to read/write data. Apps that use PDAs will route their users to these screens to give permissions. You can see their user journeys here.
Yes, applications that do more complex analytics with personal data should have their own product server. Check out the design patterns to see what you would like to use PDAs for. Some personal data is very sensitive and should be held in PDAs whilst others may not be sensitive and could sometimes be pseudonymised (de-identification). You would need a good solution architect.
Our app wants to be able to search customers by their cars' license plates without fetching all customers. We did this by storing license plates in our separate database (in unreadable, hashed format), but this also allows us to see if a license plate is in our DB. Does this count as storing part of the "plutonium" ourselves? If so, is there another way to do this using Dataswift's API?
Under the law, you can store any personal data in your own back-end as long as it is not used for any other purpose except what is stated in the app. Dataswift will enable search within PDAs in the future so you may wish to put that data fully in the PDA. However, to know if there is any potential issues, it's best to put the app through review to go live and state the reason for your design when the app goes for review. The governance team may ask for an impact assessment and you may need to report on that when the app goes through review.
PDAs are owned by the users of an application. We would consider that a B2C model. However that’s a simplistic answer. For a more comprehensive view of possible models, see our “Design Patterns using PDAs” → https://docs.dataswift.io/knowledge-base/design-patterns-using-pdas
No, PDAs and blockchains are two completely separate technologies with highly differentiated use cases.
The PDA is a personal data server - complete with storage, database, authentication, computation and server capabilities, which is legally owner by the individuals themselves. It is used as a data server but in actual fact, it is much more than that.
A blockchain is a digital record of transactions. Blockchains are useful for recording/logging transactions but storage on blockchains is limited. PDAs are used for storage and computation. Blockchains may choose to store their crypto assets on the PDA.
For example, TODAQ is a partner on the ecosystem, and they will give out PDAs when members join their TODA Network, as a combination of TODA and PDA would widen the number of applications that can be built on TODA.
You need to log in to the Developers Portal with your own PDA. If you haven't got one, you will be given one when you register (and also given a sandbox one for testing). If you already have one, you should just log in with it. Here's the flow chart. Remember, if you used an email address before, it is likely you cannot re-use it (for security reasons).
The Dataswift ONE Platform, operated by Dataswift, is what enables data rights, data mobility and data control individuals and organisations.
PDAs are hosted in Dataswift's cloud infrastructure. Personal data written into the PDAs would hence reside in the said infrastructure. Data written into the PDAs are legally owned by the PDA owners, who are in full control over access and sharing preferences of said data.
It depends very much on your use cases. Please explore our “Design Patterns using PDAs”.
Dataswift’s product achieves the goals of mobility and unification through a unique architecture of personal data. Our architecture focuses on keeping personal data centred around the individuals and granting them full control over where, when and how to share it. As a result, an individual can create unified datasets around themselves and selectively share relevant data with third parties through their personal API on the web.
It depends very much on your use cases and the maturity of your current system.
You can choose to mitigate your risk by partially porting your system to the Dataswift platform. We would in fact prefer you to do such a pilot, rather than an all-or-nothing migration. As a deep tech company, we totally understand and welcome such prudence.
Yes, especially when you need to do aggregated processing and/or analysis of the data of multiple PDAs.
There are a couple of reasons why you would leave data in the PDAs.
- Privacy - You can only access data on the PDA if you have been permitted by the PDA owner to do so.
- Data Freshness - data on the PDA is verified and maintained by the PDA owner. Hence, the data is never stale.
- Cost of storage - it costs you financially to store information about your customers. As time goes on, most of your customers' data becomes stale.
- Analytics - Personal data can have analytics performed on it within the PDA infrastructure (via PDA Functions) and the resulting analysis can be pulled anonymously and aggregated ensuring a scalable, secure and compliant data science function.
For a deeper discussion, see “Building Apps? Why they should be built on PDAs?”.
Design your application around the design patterns described previously. Then decide the application environment you want to build in. You might need to stand up your own application server if your application is more complex.
No, we do not impose any structure on the data itself. The Namespace can be configured as you see fit.
Yes, but this wouldn't be the most efficient set-up. This is a question of system architecture and the most efficient set-up would be one that uses a Contract PDA. Read more on Contract PDAs here.
Dataswift's infrastructure is highly versatile and can be bent and moulded to fit various use-cases. This is a matter of architecture design and is possible.
Dataswift does not currently offer ORM, but ORM can be done 'upstream' and Dataswift facilitates pull & push data into designated Namespace with a given user; both structured and unstructured data can sit within the PDA.
(describes the actual set of components that make up a system)
You get a PDA when one is issued to you through a client’s app. Or you can get one directly from Dataswift on our website and choose which apps you wish to log into.
We currently do not offer this functionality. However, we are working on the functionality to export data as JSON files for 2021.
Federated learning is a good complement to PDAs and can be very powerful, but it depends on the purpose. Within PDAs, PDA functions can do “edge” like computation for a person’s data but these tools must be pre trained, read here for more information on PDA Functions.
One could do a combination of data from PDAs to device for federated learning and then pushed back to PDAs or product server but the pattern and design needs to be crafted to suit what the purpose is. Certainly, a combination of federated learning & app server & HAT/PDA is more powerful than federated learning & app server depending what you want to process and where.
Also, PDAs come with a full economic model already running, i.e., merchants (apps) pay for API calls. Any new insight data created by Data scientists in PDAs will get royalties if the data is used by apps. The 3 business models federated learning can have with PDAs are:
- 1.Monetize the trained AI on behalf of all the people whose data it was trained on (B2C)
- 2.Sell the AI to enterprises who want it for their apps enabled by PDAs (B2B)
- 3.Contribute trained data into PDAs to other apps that may want it (infrastructure)
Normally PDAs are issued at 100MB.
Typically, all third party data should go into its own namespace for 3 reasons:
- 1.Anyone else can create a similar data plug and PDAs should not have duplicate entries. There are exceptions to this, for example when third party data is private (e.g., credit ratings). In such cases, Dataswift governance will assess that risk as they come through review to see if duplication poses a risk to the system.
- 2.Bringing in third party data has legal implications e.g., the source imposes conditions of re-use and re-sharing. The way to ensure that data can be owned by the PDA owner is to use our Data plug system where such issues have been resolved.
- 3.To reduce confusion to the user and to comply with data protection rules, the user has to know that the data is not created or acquired by the app but it is a third party and the name of the third party must be shown. Again, the data plug system takes care of that with a consistent messaging of a third party write contract.
So the best practice is to put third party data into their own namespace.
However, just because it’s in its own namespace doesn’t mean the app can’t use it. It’s a Data Plug and the application can call on a data debit to use that data from their app. The use of that data is subject to governance but that’s the standard review process of ascertaining usage, duration and purpose and the DPIA/PIA for the app.
Yes, although the app will be “B” rated according to the HATDeX certification rating. We understand that there are some applications e.g., Research, that will need to store a local copy of the data. If the app is in a jurisdiction where there is a “right to be forgotten”, it will need to delete the data if the person disables the data debit and evokes that right. Also, whatever that is stored locally may need a “refresh” of the data to make it more recent and useful.
Yes. Dataswift operates a cloud infrastructure to hold the data of each PDA in individual data silos.
Dataswift is purely a data infrastructure. It does not process any data on behalf of non-PDA owners, be they individuals nor companies.
The PDA is open sourced and can be found here.
The code requires PostgreSQL and Java SDK.
At the moment, it does not install onto a mobile device.
We only track API calls between the app and end user for quality of service and billing purposes. The actual data between the app and the PDA is not tracked.
For more information on what we are tracking, please refer to the following guide.
We encourage the use of Dataswift platform’s contract creation capability to register and track all of the entities interacting with PDA-owner’s personal data, both inter- and intra-organisation. The data exchange happens through first-party contracts between the registered entities and the PDA owner. Such system provides transparency, full audit trail, and best possible regulatory compliance.
From a technical perspective the PDA does not automatically differentiate between the different types of data. However, all the live applications go through our internal compliance review where data risk profiles are evaluated. Depending on the risk profile our team is likely to recommend application permission adjustments to best mitigate overall risk.
While the PDA can technically store all kinds of data, it is recommended that only personal data or data related to the PDA owner be stored on the PDA. This is because
- 1.any data written to the PDA belongs to the PDA owner
- 2.it is preferable to store business data externally so that it is decoupled from personal data and associated contracts. In the event where the user revokes data access to the business, any business data stored on the PDA would also become unavailable and lost to the business.
We have a simple endpoint to tell you the data that is available in the PDA ecosystem. However, we cannot tell you which PDAs have what data. You can only query the PDA for the existence of such data when the owner has signed on to your application.
No. Unless there is a change in the namespace permissions, and/or the namespaces the data is being requested from. We only require an update when when there is material change to the data sharing contract between the PDA owner and your entity.
Our PDA APIs are usually backward compatible. Otherwise, they are versioned in the API URLs e.g., …/api/v2.6/data.
A developer could (with the correct permission) extract, perform external calculations and put this data back into the PDA.
However, this could affect the rating the app is given. PDA Functions allow an app developer to run algorithms on PDA data, only when needed, without leaking it outside of Dataswift as a private serverless function and maintain integrity of the data, thus enabling a higher rating for their app, which bolsters transparency & trust by end-users.
More info on this is here.
Revoking data access does not mean:
- The PDA is deleted
- The data that was requested for is deleted
Revoking data access means:
- The application no longer has access to the data, even if a valid application token is provided
- The same application must request again for permissions to access the same data
Revoking data access puts a stop on the ability for an application to read & write data into an end-user's PDA.
Dataswift will not have our own mobile application, we will be moving to responsive web design. However, developers can develop apps for any platform they want to support.
There are in fact no data transformations that Dataswift’s platform facilitates or deploys.
However, there is a data mapping function that allows the remapping of raw data.
Unless rendered into text based data, files cannot be stored in the PDA. These files are held in the S3 storage system offered by AWS and managed by Dataswift. This can include items of data such as photos and pictures.
Due to the potential copyright held by 3rd parties in files held within the PDA storage system, Dataswift currently does not allow Data Debits to access information stored in the file system. The file storage system is therefore a part of the Services that Dataswift licenses to PDA owners.
Images saved to the PDA via the File Storage API are not transformed in any way. However, there’s nothing stopping the application developer from using encoding techniques on images and storing the output in the PDAs. One such technique is uuencode/uudecode.
PDA owners are able to see the output of the AI tools from within the PDA and can also view the new data generated by the PDA Function in their PDA to give them more insights into their data.
Data Plugs are applications deployed by Dataswift for the purpose of enabling PDA owners to acquire their own personal data as made available by 3rd party organisations through open APIs, such as Facebook and Spotify data. There can be private data plugs where a healthcare provider or an enterprise may create the data plug to enable data subject access request by the user but plugs would need to be reviewed and certified by Dataswift and deployed into Dataswift’s system to be used in production environments.
When a PDS signs in to a PDA-enabled website or application, the HMIC screens associated with that website/application is presented to the user. At this stage, nothing is written to the PDS yet. On agreement to the contract points set out in the HMIC screen, then the application is set up for use with the PDS, i.e. the PDA is created at this point. Technically, the application ID is registered and the permitted location for data exchange (the namespace) is configured. Everything is tied together by an
application-id. Reference numbers are more for centralised systems to identify unique/individualised records. Because the agreement is written to the owners' PDS databases, an individual HMIC reference number is moot/redundant.
Data Plugs act as intermediaries between third party systems and individual PDA owners. By using data plugs, PDA owners can gain access to their data from organisations who do not have PDA-enabled Apps. Utilising a Data Plug involves a multi step process containing both contractual and data protection considerations.
Dataswift's data protection has appropriate physical, administrative and technical security measures to safeguard all the information we collect in connection with the provision of PDAs and Services.
We continually review all security measures and update them when appropriate. Dataswift deploys multiple layers of protection to protect personal data stored.
The incentive to hack a PDA is low (like your router of your house), since all the hacker would get is one person’s data instead of 100m people’s data. Hence, this reduces the surface area of attack, making it more secure.
Our security is industry grade standard for financial data (as compliant with the FCA).
In the event of a data breach notification (when considered the controller) experiences a personal data breach involving the PDA owner’s data, Dataswift will notify the ICO and the PDA owner.
Dataswift, as a matter of good practice, will carry out a privacy impact assessment (PIA) of all its processing of personal data. In certain cases, where the processing is more risky, it will also need to carry out a data protection impact assessment (DPIA) – which is a requirement under the GDPR.
Data is stored in AWS Relational Data Store (RDS) Database Servers. File storage is configured with server-side encryption using AES-256 encryption. Storage policy enforces any file uploaded into the storage to be encrypted. All logs, backups and snapshots for a Database Server are encrypted. Database Servers stand-by replicas maintained for reliability are also encrypted.
The Dataswift model reduces the external risk in several ways, these include:
- an individual server and API for each PDA, this provides a much smaller surface area for an external threat to attack
- the cloud storage (provided by AWS)
- Access Token authentication process which ensures access and control of the PDA is securely held by the PDA owner
We are following the best industry security practices. Please refer to the sections on “Server and Network Management” and “Data Security” in our “Security Requirement Best Practice”.
Every PDA owner has his own data silo with unique access credentials. If that one PDA is compromised, the data on other PDAs are still secure. There might be a group of PDA accounts (and their associated data) hosted in the same physical location, however, the distributed nature of control over the accounts (where the user has full control of their own account) means that the overall system is decentralised.
With regards to data privacy, the PDA owner has full control of the applications that has access to his data. They can terminate data access to the applications at any time via his PDA dashboard.
Currently, PDA private keys are held in a dedicated vault in the cloud infrastructure. We are eager to bring even more security assurances to our users and allow private key storage locally. We are also mindful of the fact that more secure solutions usually deliver less favourable UX to non-power-users and want to take our time to get these improvements right.
To help mitigate Machine-in-the-Middle attacks, we ensure everything that enters and exits our network is encrypted. Additionally, all traffic between our services on public networks is also encrypted. We are rolling out Platform Authentication using verifiable access tokens that will give us a level of control to instantly remove API access if the need arises.
The simplest and most effective way of limiting the risk of leaks from an employee is by reducing the number of people with access to Production data. In our case, one person has direct access to production systems. Any and all data is encrypted both in transit and at rest in the database. External bad actor risks are addressed using regular penetration tests using a 3rd party and engineering-led library and dependency audits for flaws. Auditing the code we rely on is in the process of being automated and included in our Continuous Integration solution. In concert with these specific solutions, our preventative measures outlined in this answer are also important steps towards preventing leaks, namely SSL everywhere and revokable access tokens.
Encryption can also be decrypted. Direct access to the production system is limited to one person. Access is controlled by a robust user role infrastructure that logs all actions taken per user, allowing us fine grained limitation of privileges. All of our data is encrypted in transit and at rest, with differing passwords per user database.
Our Data Durability is addressed in a number of ways. We automatically back up our databases daily and prior to any deployment in order to provide a restorable snapshot. Our databases are deployed in multiple availability zones and are redundant in production. At an application level, we use a soft-delete strategy that marks records as deleted, but they remain as a history that can be restored at a finer-grained level than the process outlined above. We plan on improving this design in the future and on the roadmap is a full Event Sourcing architecture which will further increase our durability. Beyond processes and software solutions, our hardware is monitored constantly and in the event of any degradation the node is replaced automatically.