Laurent Hardy | Rubén Muñoz
This document is not meant to be an exhaustive technical explanation but a way to start a conversation about the project with the community about how personal data is controlled, what it is used for, how it could shorter the path towards self consciousness and how it could revive the digital economy for the people of the 21st century.
Generally speaking, the digital economy works by building applications on top of protocols that belong to the public domain that are free. This is what led the internet to evolve from what is called web1 to web2, with the addition of a social layer.
This leap forward didn't come for free: web2 was built by companies. The way the social layer was developed as a business but remained accessible for everyone to join for free was by creating a market around personal data. Users don't have access to that market, only the platform that owns the data. This remains the case today, as almost every digital platform eventually starts making profits by exploiting some sort of data market.
There's an important distinction between data by itself and what you can learn from the data (i.e. the process of creating knowledge). The companies that built web2 didn't start selling personal data, they started learning about their users. By the time they knew them well enough, they had built a elaborated digital profile about every user. They could then leverage that knowledge to their own benefit by offering a platform to companies that wish to target products and services to specific groups of users with a particular interests or profile. The reasoning behind is that a particular profile previously identified is the most likely to buy into their products and services. The return on click metrics was born.
The real advantage for companies leveraging data about their users is in the combination of three elements:
- A digital platform with some utility to its users
- Monthly active users who create data by using the platform
- A digital profile fed by the data the user generates by using the platform and the interpretation of data scientists.
All three elements are necessary to be successful as a business. A platform with only one of three elements is practically useless. This means that personal data itself is not worth much without the context in which it has been generated. This is probably why when Facebook bought WhatsApp for $19 billion when it had 350 million monthly active users, each user's data profile was worth around $50 ($42 to be precise). And even though Facebook appears, in some cases, to know people better than psychoanalysts, users are, on average, worth between $1 and $14 per quarter.
As a user of these digital platforms, one should see herself as a volunteer working for a multibillion company.
Users are somehow incentivised (and it's not money) to populate the platform with content they generate, circulate this content as much as possible amongst new and existing users and consume the content of other users. This in return will update an ever more accurate digital profile of themselves and of the users they know.
Therefore the fundamental objective of the platform comes down to increasing the number of monthly active users. This is the proxy people use to measure how popular a platform is, how up to date the data backing the profile is and how frequently users come back to the platform. The number of monthly active users is also a metric that companies that wish to use the platform to advertise their products look at: the higher the number of monthly active users, the more diverse and representative the user base is, the higher the visibility an advertiser will get and hence the higher the probability of closing a sale.
It is important to note that as a result, the focus of digital business platforms has moved. Resources went from developing breakthrough services with fundamentally new features that enhance the life of users, toward capturing more of the attention of the users, whatever it takes.
The competitive advantage of platforms which goal is to extract value from their users is the database and the infrastructure of intelligence. They assume the responsibilities of storing the data (and respect their privacy if they want to) and ensure the security of the infrastructure of storage. They need to have access to the data if they wish to make a profit, but they can't afford to have others, let alone users, accessing the data, to the risk of loosing their advantage.
The meaning of online activities of the user such as blogging, text messaging, posting, buying and selling, updating profile and status is of little importance. Rather, activities are the excuse to have users generate and interact with different types of content, as varied as possible, in order to construct their profile at a granular level.
Users seem to like it: we spend more and more time in front of our screen. There's no real incentive to provide more useful services; and no real incentive to let others do it neither. The only way forward seems to be to keep extracting value from users and excluding them from the commercial exchange where money is being made (see Throwing Rocks At The Google Bus by Douglas Rushkoff for a detail analysis of the current digital economy).
The results of our extracting economy is that ownership and secrecy around personal data have favoured the formation of silos of data between companies. Data –from the most trivial to the most sensitive– is locked in lots of different services with no way of accessing it, deciding what it can be used for, or collecting a share of the incomes it might generate.
Digital platforms extract data from their users, store it in silos, then use it to build digital profile on their users, which are then used to target commercial content to them via a feed mechanism. This process generates incomes on the order of 70 billion dollar per year in the US alone, because the entire supply chain is controlled by the same company. The more aspects of the supply chain it controls, the more money can be made. That's why we see digital platforms adding new "features" all the time. This is the way the control over the supply chain keeps increasing. No other entity but the company has access to the internal mechanics of the digital platform, therefore innovation can only happen inside of the company that created the platform in the first place (this actually holds true for every centralised industry, see The Internet of Money).
It appears to be the case that the users' online activity is in itself the raw material digital platforms extract value from through a more or less complex ad selling machine. As users, we're left with a free tool that allows us to spend time on, as we're not allowed to share some of the revenue our data generates.
Under this model, time became the scarce commodity every business tries to capture –especially since the booming of social media–, which in turn, created a fierce competition for our attention. A competition that will last, probably forever.
A legitimate question to ask ourselves at this point is whether the business model of digital platforms, as it is configured today, really serves the people's interest? Without running the achievements of the construction of the social layer of internet down, it looks like the benefits don't always surpass the disadvantages.
- Our limited available time gets eaten up.
- Local economic activity is progressively replaced by online marketplaces.
- Social media get to know us better than our relatives according to some research.
It is difficult to foresee what could be achieved, had we access to all the data generated in every aspects of one individual user. Nevertheless, assuming we do have access to such data, what else could we possibly be doing with it that we are not already doing today?
As a reference point, more than 60 years ago, in the context of an centrally planned economy versus an open-market society, it has been suggested at length that
A centrally planned economy could never match the efficiency of the open market because what is known by a single agent is only a small fraction of the sum total of knowledge held by all members of society (The Use of Knowledge in Society, by Friedrich Hayek)
If we were to assume Hayek's hypotheses that the more decentralised, the better the outcomes, then it has to be the case that the more people has access to the data, the more efficiently and interestingly the data will be used. Therefore one single company's intelligence infrastructure could never compete with the combined capability the rest of the world has to offer. By assuming Hayek's hypotheses, we have reduced the scope of the original question to a question of working out better uses of our data besides using it for ads selling. In any case, a new solution should always be in line with an increase in privacy, security and fairness with regards to the management of personal data.
The first immediate and fundamental feature that is achieved is the control over the use of personal data. Control over the use of data is different from control over the place where the data is stored. Those are two separate goals that should eventually be solved in the new paradigm, but not necessarily at the same time.
Control over the use of personal data acts as an enabler. Once the possibility exists, it will reenforce the trend of empowerment regarding privacy and security of personal data. This empowerment will drive the demand for new options, applications and services. This will activate the new paradigm for the digital economy.
Full control over your personal data (access and storage) is similar to a personal server: we call it digital warehouse. A digital warehouse without storage nevertheless offers interesting functionalities like visualising the data users have generated and uploaded, interpreting the data through different lenses and of course use it in applications. A digital warehouse should work as easily as any plug&play technology: by controlling the uses of your personal data, you can plug or unplug the data stream from an application that uses it.
Data is the digital evidence of events happening in our digital life. Controlling your data is also being able to access that evidence, which is the first step towards taking better decisions in your life.
The moment users have full control over their data (access and storage), silos of data will start to break. Data will then be able to travel outside of where they are stored and escape from the control of the platform on which they have been created.
This emancipation of the data does not mean that whoever wants to gain access to personal data will be able to do so without any restriction. It just means that now the user has the option to decide if it's ok to let someone or a company have access to her data. The user can ask herself if it is really worthwhile or if she would rather remain private about her data. On the other hand, this also means that institutions that wish to use personal data will have to earn it from the users, and probably compete for them.
We see the digital economy evolving to a much more distributed workforce but the problem remains that raw materials and tools are still in the hands of a few very large companies.
Barriers to entry to business models that revolve around personal data are generally considered high in industries like social media, data broker, banking, digital services, etc. This is the case mainly because, as we have said earlier in the section "Stagnation of centralised innovation", the focus in those industries has been secrecy and confidentiality around silos of data. For them, it doesn't make sense to share the developments made by exploring personal data. Algorithms must remain private because algorithms are what differentiate one platform from another and therefore they are the mean by which personal data is exploited economically.
Data becomes a commodity when users control their data. And the purpose of improving algorithms switch from extracting value from the user to seeking the best interpretation of the data to gain knowledge –a path to data humanism. As a consequence, a lot more people with new ideas will have access to the same tools that were once the privilege of a few dominant companies. And this is where social innovation will really starts.
The 21st century will hopefully be the century of self-consciousness. A time of realisation that we take too many things for granted, like education; that we don't take the time to know ourselves enough; that despite better communication technologies we actually interact less with one another, we are less and less informed about the people we live with; or that there's a growing misalignment of expectations between people and companies (Simon Sinek on Millennials in the Workplace).
The problem of taking things for granted of course is that because we take them for granted we don't realise that we do (Sir Ken Robinson).
The battle between digital platforms for our attention is time consuming for users. In fact, our capacity of paying attention should be seen as a powerful tool for sensing the world around us. How we use our limited attention can have huge impacts in our life. It has even been demonstrated that change in attention lead to change in internal structure in our brain.
Taking control over our data is a way to free ourselves from the attentions traps that we too easily fell into. Not because we are weak or idiots, but because this was the job of thousands of engineers, neurospecialists and psychologists. They know how the brain reacts to signals and how it can be tricked to direct our attention to what they want, not necessarily what we want or need. To measure is to know, and once you know it is easier to take useful decisions. The measure should be seen as a starting point towards a deeper level of consciousness about what is happening in our life.
An access to multiple sources of personal data would allow us to discover facts about our life that we might not even be aware of. Some might just be curiosities, others could be the cause of something fundamental that is happening in our life. At the same time, we would be able to learn about the community we're living in, or the country, the continent or the planet.
Without enough evidence, our brain make up stories or heuristics in order to compensate for the lack of information. When we strongly believe in the stories, we make judgements about them: good or bad, high or low, enough or too little, great or lame. Finally, after the judgement comes the decision in order to do something about it. But as psychologists noticed, the error that results from the mismatch between the story we believe in and the reality is what they call cognitive biases. More trustworthy evidence presented to the user should reduce cognitive biases and allow him to take better decisions, regardless of private interests.
Control over personal data will allow us to drive consciousness towards the present moment, the place where the things happening are important to us because we give them importance, not because our attention got carried away to places important to others and for other reasons.
The paradigm shift of personal data implies that the access to data becomes a matter of providing a service to the user with enough added value so that, according to his standards, it compensates for the loss of anonymity. The incentives could be social, economic, scientific, or other forms of incentives we haven't heard of yet. But the fundamental change is that this restore balance between users and companies.
To get to point where a digital profile from the new paradigm reaches a level of quality equal or superior to what is currently available from the old paradigm, a new class of digital services will be necessary. The rational behind this particular class of services is that it essentially provides the user with enough insights about himself that his data means something to him, and can then be leveraged as something of value to others.
In this new paradigm, it is fundamental to understand that data is not scarce anymore. Hence, we can expect the value of personal data to shift from a proxy for the probability of being a potential buyer for a particular class of products, to a mean of introspection that reveals valuable characteristics about our personality, which can later be marketed by the user at her own discretion. If data is not scarce anymore, and silos of data tend to disappear, it also means that the intrinsic value that previously existed around generating, holding and securing data will also decrease. The previous centralised infrastructure brilliantly supports a specific business model becoming obsolete and is therefore no longer entirely necessary in the new paradigm. That infrastructure is going to be replaced by open fat protocols where the value switch from the application to the protocol itself. This type of infrastructure is what is going to power an entire new digital economy around personal data.
Note from the authors. This is the first of several articles that will be published. We have been working on path to a possible solution for building the new paradigm. Although more detailed information could be released, we would like to evaluate first, with the help of the community, if this appeals to people, if we didn't make mistakes in our hypotheses to validate the potential of this project and if this introduction to the solution for building the new paradigm resonates with what the developer community knows better than we do. Your feedback will be much appreciated. Thanks.
To develop the core technologies, we are going to need a clean environment exempt from pressures originating from the necessity to be profitable. This is a challenge most open source projects endure, but with the advent of blockchain, a solution is starting to emerge. That being said, we cannot forget that the objective is to build the pillars of a new digital economy for personal data, therefore we cannot forget that it has to be profitable at some point.
For this reason, we have divided the project into two separate entities: Arcadia (a foundation, non-for profit) and AlterEgo (a company, for profit).
- Arcadia FOUNDATION will be responsible of developing the core technology, Arcadia, including but not limited to the digital warehouse infrastructure, the governance model, the economic model and the intelligence infrastructure.
- AlterEgo will be responsible of experimenting with tools that Arcadia will make available. At some point, AlterEgo should be a profitable business, providing services for the new digital economy for personal data and proving that it is not only possible but that it has also be sustainable in economic terms.
At this stage, we're going to focus on Arcadia, the first product built by AlterEgo. Information regarding value propositions for AlterEgo will be published later. For now, keep in mind that when we talk about institutions, we basically refer to AlterEgo or the like.
We have identified 4 types of actors who will participate and interact with Arcadia. Each one of them, in its own way, will make Arcadia interesting, attracting, valuable and profitable.
- USERS. They decide to upload data to Arcadia. They want to control and learn from their data but generally don't know how to do it. They might decide to share their data with others if the incentives are there. They shouldn't be paying for controlling their data warehouse.
- DEVELOPERS. They are responsible of extracting the value of the users' personal data. They have an incentive to do so because they may be interested in doing research and would value having access to such a complete dataset. For others it should be more like a job and therefore the economic incentive should be royalties they will receive when their algorithms are used in services.
- INSTITUTIONS. They are service providers. They want to provide services to their clients that requires access to the user's data, involve building a relationship with the users, or involve having them actively participate in some way. They want to build new tools to use Arcadia. They need the approval of the users to actually sell their services. They have interests in building and maintaining good relationships with users, more than with their clients.
- CLIENTS. They have questions that can be better answered with the help of the users. They may have customers who are users of Arcadia and want to know if they are satisfied. They want to close deals with users through a service one of the institution offers in exchange for the use of the users' data.
We believe the only way trust can be regained from the way personal data was handled in the old paradigm is by progressively decentralising every aspect of the underlying infrastructure. Both for technical and economic reasons, blockchain appears to be a good compromise for being a core technology of Arcadia. Using blockchain implies being able to correctly define the underlying cryptoeconomics of the project: a mechanism design that include a set of incentives that allows all the parties to exploit the system at their own advantage without harming anybody else or the system.
Finally, the governance model around which the collaboration is going to be defined will be extremely critical to the quality of the results each participant (users and clients) is going to get in Arcadia. Governance is about how decisions are taken in a decentralised infrastructure, what voting mechanism it uses, who is eligible to vote and who is not. Governance is also about resolving disputes, facing responsibilities, or changing the rules.
We have identified four key modules for Arcadia to exist: DATA WAREHOUSE, DIGITAL PROFILE, KNOWLEDGE MACHINE and ECONOMICS.
- The Data Warehouse covers the data management of Arcadia, from the moment users upload data to the moment users grant other users or developers access to it.
- It is a private space for the user to store and access his data.
- The availability and reliability of the service should not rely on trusting a third party, but rather on a decentralised infrastructure where incentives exist to always maintain the service live, regardless of the value or the sensitivity of the data stored.
- Trust in the infrastructure, privacy around personal data and control over permissions will be core values of the Data Warehouse and by extension of Arcadia. It is paramount to show that, whether working directly or indirectly with personal data, these values are always ensured.
More detailed information about the requirements for this module can be found here
- The Knowledge Machine establishes a way for users and developers to collaborate together so that raw data from the users are converted into knowledge usable by other users and institutions.
- There exist built-in incentives for user to upload data and for developers to extract value from it.
- Their must exist a way for users to share their data with developers without revealing information about themselves, and for developers to share their algorithms with everyone without loosing their intellectual property rights.
- The Knowledge Machine should also produce outputs that have consistent meaning for every users so that they can be used in different contexts without being prone to misunderstanding or manipulation.
- To improve upon algorithms that produce the outputs of the Knowledge Machine, ground truths should be used extensively. Mechanisms to establish the ground truth and compare different algorithms will therefore be required.
More detailed information about the requirements for this module can be found here
- The Digital Profile module describes how to build a digital profile for the user based on the raw data he uploaded to the Data Warehouse.
- For users to interact with their digital profile in a human readable format, intuitive data visualisation will be required.
- The Digital Profile is also the place from which the user should be able to get information about his data in terms of economic value, interest from developers, offers to use it from institutions, offers to collaborate, results achieved, etc.
- The Digital Profile module should handle the functionality of sharing the digital profile with other users, developers or institutions. Users should be allowed to specify the terms and conditions whereby they agree to share their digital profile, as well as to whom, for how long, etc.
More detailed information about the requirements for this module can be found here
- Arcadia should be financially self-sustainable, which means that no profits are generated based on the use of data and everything else –apart from operational expenses– is redistributed to users, developers and institutions that maintain, use and improve Arcadia.
- A new digital economy based on data should emerge and make new services available like job offers for users, services offered by clients directly to users, research programs for users and developers to enrol, etc.
- The built-in economic model should create virtuous circles where participants benefit from the activity other participants and by doing so make Arcadia better for everyone.
- It should also be possible to accelerate the social innovation through decentralisation and obtain a broader perspective on societies and how they work.
More detailed information about the requirements for this module can be found here
To build someone's profile is to make a statement about who you think that person is to you, not necessarily who that person really is. Leaving to the user the task of building his profile entirely is also problematic, as we rarely are objective about ourselves. In a decentralised infrastructure like Arcadia, we can bring the two together:
- Users only upload the data they want to upload, and by doing so they might be picking what they prefer over what is representative of them. They will tend to be more subjective.
- Developers can learn from other users, detect patterns and look for problems. They will tend to be more objective.
Digital profile is not limited to a fixed number of elements that can be included, unlike personality. But we still need a structure to organise all those elements. Traits theory could help us define what the digital profile might be. Four basic models have been suggested: Allport, Cattell, Big Five and Eysenk. Currently, the Big Five and the Eysenk models are the two most popular ones. But what they all have in common is a tree of traits that are combined together to give an interpretation of the general personality of a human being. Therefore we would suggest the following structure to organise the data around a new dedicated model for the digital profile.
DATA > VARIABLES > CHARACTERISTICS > FACETS > PERSONALITY
A key element of the digital profile for Arcadia is going to be the characteristic. A community of developers is going to be incentivised to create new characteristics and to make existing ones better. A characteristic is an algorithm that takes data as input, treats in a certain way and return an output as a result. One characteristic applies to each user but the output for each user will be different, because the data that feeds the characteristic will be different. Users should be interested in using characteristics to understand and visualise patterns, habits, preferences, achievements or mistakes in their own lives.
Characteristics are elements with a one level of abstraction from the data. They don't say much but they say enough for them to be used in different contexts. This means that characteristics must be representative of the population of users. To verify that this is actually the case, users should have a way to recognise the validity (or the non validity) of one particular characteristic. Examples of characteristics are: risk aversion, distribution of values, work balance in life, etc.
Looking at one of their characteristics, users should feel like they are looking in the mirror and recognise an aspect of their appearance. It should be that familiar.
Characteristics are self-explanatory: they include not only the data, but also the context in which the data has been generated, the interpretation of developers and the validation of the users. Characteristics should therefore be what users may want to share. The data that feeds into a characteristic will never leave the user's data warehouse.
Characteristics are then combined into facets, which for now are somewhat arbitrary categories that represent aspects of our life. Examples of facets include health, work, leisure, etc. Finally, the way facets are visualised together gives place to the user's personality.
The idea is to get more accurate and useful insights from personal data and have enough users willing to put their characteristics to work in order to, at least, compensate for the costs of running Arcadia. If proven useful to users and profitable to developers, the quality of characteristics won't cease to go up. At some point, Arcadia should therefore represent a better alternative for clients than what is currently available from digital platforms. To get there, the first step will be breaking silos of data and connect them together into the user's data warehouse. The second step will be building a monetisation layer on top that provides the economic incentives for the players to want to participate, namely users, developers, institutions and clients.
Arcadia uses blockchain as the underlying decentralised infrastructure to let the user control his data, i.e. permissions management. But at the same time, blockchain can also be used as a mean of payment (pay the data you use) as well as a unit of measure (quality of characteristics). A pragmatic advantage of cryptocurrencies over fiat currencies is that it is borderless. A fundamental characteristic of cryptocurrencies is the capability of engineering the properties of money with functionalities that fiat currencies simply don't have, like for instance to have a trustless decentralised digital entity processing transactions automatically only if a series of conditions have been met, or the possibility of making fast micro transactions, or to be able to stake money to vouch for the outcome of a task being performed without having to process the transaction in advance.
Each time a user's characteristic is used by a client, the user get paid and so does the developer who created or upgraded the code for the characteristic. The user is the one who generated the data but the developer is the one who managed to extract knowledge from the data and make it valuable. Thanks to the work of the developers, users get to discover new aspects of their digital life and earn money when their characteristics are used by clients. Building characteristics is an activity of interest for developers because they will receive royalties each time the characteristics they helped building are used. At the same time, developers need the user's approval to be able work on their data and to validate the characteristic they have developed. Which means characteristics are built by developers but backed by users, and so only through a fair collaboration will the result be profitable for both.
Quality of the data is far more important than the quantity. Therefore the incentives must be set so that quality is favoured over quantity. This will also be reflected in the price of the characteristic. Because Arcadia is considered a public infrastructure for personal data, there will be a minimum price for every characteristics in order for ARCDIA to compensate for its running costs. The remaining value generated by the characteristics is redistributed to the participants (users for their data, developers for their algorithms and institutions for their services).
Services developed by institutions will be able to plug into Arcadia and make use of the knowledge provided though characteristics. The same way users will be required to validate characteristics, the services offered by institutions must appeal to the users in order to be sold to clients because the user will have to approve that her characteristics are used in this service. Meaning that the user is in control over how her data is used to build new knowledge, and even though she can decide not to share anything with anyone, she'll have incentives to do so: to gain new knowledge and earn money.
So how do we envision the new digital economy is going to work? An in depth analysis of the requirements for the economy of Arcadia can be found here.
Governance defines a set of rules and parameters for all the parties involved to collaborate. Those rules and parameters can be voted on if they need to be adjusted in the future, or they are defined only once if they are considered ground rules. In the context of decentralised blockchain applications, governance is also the place where economic incentives are adjusted so that the participants can exploit the system at their own advantage without putting others or the system at risk.
In terms of governance, it is paramount for the development of Arcadia to increase the complexity of interactions and functionalities slowly over time, so that the probability of making mistakes can be kept as low as possible.
The first situation the governance model has to deal with is the level of participation from users and developers so that enough quality characteristics are being built. Here is an example of how the system might be gamed: because nobody but the user has access to the data, the user might decide to upload data that is not what the user says it is. And here is another: because nobody but the developer has access to the code of the characteristics, a developer, in order to get the royalties for herself, might decide to upload a piece of code that is not actually better than the previous one. We could also imagine a scenario where the users don't check that the characteristics actually reflect an aspects of their life, just to maximise their gain. Another scenario would be that some users and developers decide to collude together in order to increase their respective gains, at the detriment of the quality of the characteristics a well intentioned user or a client of Arcadia would get.
Global parameters for Arcadia will be part of the governance model to adjust the economic incentives or the requirements for performing certain activities in Arcadia.
The voting mechanism for executing the governance model should also be extremely well adapted to Arcadia. If governance power were to be proportional to the amount of token one individual has, large investors might bring the system down by behaving irrationally or even voluntarily.
There are still plenty of uncertainties that could prevent the project from working. Here is a preliminary list of specific reasons for why it might not be possible to make it work. We are going to explore possible solutions to overcome those in a later article.
- Arcadia is too ambitious to be developed at once. But lower the expectations might not be enough to make a difference.
- The process of uploading data is too tedious and requires the user to be involved in every step.
- It is not possible for users to access enough interesting data from the digital services they use.
- It is not possible to have developers work on the user's dataset without revealing the data itself.
- The minimum user base necessary to provide representative results form characteristics is too high.
- The average block speed is not high enough for Arcadia to scale.
- The cost of computing power on the blockchain is too high.
- The technologies of decentralisation like blockchain, decentralised data science, database encryption, decentralised file storage and computing power are not mature enough to bring Arcadia into production to the expected level of reliability.
- Users don't care enough about controlling their digital life to dedicate the time it requires to make Arcadia work.
- Measuring the performance of a new characteristics and make sure it complies with some quality standard is not be possible to do in a decentralised fashion.
- Participants of Arcadia behave irrationally and so there's no way to trust Arcadia to provide high quality results backed by users approval and developers integrity.
- The decentralised infrastructure are not reliable enough or trusted by the participants, even though they might be interested in the projects.
- Clients have specific needs that are not compatible with working with characteristics. They would rather prefer to work with the data directly.
- The economic incentives for developers to build characteristics are too expensive.
- The expected revenues that Arcadia will generate based on personal data might not be enough to compensate for its costs.