42 results in Facet Publishing
Conclusion
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
We have covered a great deal of ground in this book, and the diversion into physics and what we can learn from it may have surprised some readers – but why reinvent the wheel? Other disciplines such as physics have been around for much longer and are more formalised than where we find ourselves in data, so why not learn from them and from other professionals? We feel that when we started writing about data we were still in the Wild West stage of formalising data leadership and what it means for organisations. Time has definitely worked its magic, this area has moved on so fast and so many wonderful voices have joined in the conversation that the idea of using data as an asset and what that means in organisations have both developed considerably. That can only be a good thing!
We hope that we can continue to challenge ourselves in this discipline to learn from others both within the data space and also outside it, because then we can all become better.
We thought long and hard about what to call this book and finally decided on the title Halo Data because of its application to what we are all doing. Data hasn't changed, but hopefully this book will give you a different way of thinking about it that helps you. Halo data plays such a large part in championing the role that metadata and the ‘distance’ from the core data can have in using data and how it is described. The paradigm shift is about unlocking value. It isn't about data being the new whatever: it is about data being data and how it delivers value to the organisation.
Just thinking about data in a different way wasn't enough for us, because we also had to go through how you make it practical. If you don't use it, why bother collecting it in the first place? If nothing else sticks from reading this book, just remember that using the data to solve a problem or create value is what really matters.
The value proposition and the paradigm shift bring ethics into sharper focus because, while data can take us to new, exciting and innovative places, it can also take us into new, darker places.
2 - What is Metadata?
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
The simple answer to the question ‘what is metadata?’ is ‘data about data’. But that answer conceals so much rich detail about metadata. A bit like the analogy of the iceberg, the answer gives you the tip, but not the deep understanding of metadata that data professionals need in order to address the bigger question of data value.
Metadata has some early origins in the world of book publishing: who wrote it, when was it written, who published it and when, the territorial rights for publishing, who holds the copyright, how many chapters it contains, the subject matter and so on. If you look at a listing of a book on Amazon and scroll down a little you will see the metadata; in fact, the whole Amazon page is built from metadata about the book. The book is the data point; all the other information on the Amazon web page is metadata. Here is part of the metadata from Amazon for our first book, The Chief Data Officer's Playbook:
Product details
ASIN : 1783302577
Publisher : Facet Publishing; 1st edition (15 Nov. 2017)
Language : English
Paperback : 224 pages
ISBN-10 : 9781783302574
ISBN-13 : 978-1783302574
Dimensions : 16.2 x 1.2 x 23.6 cm
Best Sellers Rank: 9,651 in Books (See Top 100 in Books)
26 in Data Warehousing (Books)
1 in Knowledge Management
81 in Beginner's Guide to Databases
Customer reviews:
4.2 out of 5 stars 75 ratings
Astonishing! It even goes down to the physical dimensions of the book. All of these pieces of metadata are important to different groups of people who probably have very different motivations and needs. The shipper or carrier is interested in the physical size of the book, the stockist is interested in the language, the foreign rights publishers too are interested in the current languages, the library and retailers are interested in the ISBN and prospective readers are interested in the customer reviews.
Metadata is information about the content that provides structure, context, and meaning.
(Rachel Lovinger Metadata Workshop, 1 March 2012)It serves to make it easier for others to discover, assess and utilize a dataset. Discovery and use are self-explanatory, but what is assessment? This term covers all the information that might be useful in determining whether or not one can or should use the data. It answers questions such as, does the data come from a trustworthy source?
8 - Halo Data and Data Ethics
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
What are data ethics?
There is much chatter about personal data and the accompanying legislation that is in place to protect both it and us. Whether that is the European General Data Protection Regulation (GDPR); the Data Protection Act 2018 (UK); Canada's Digital Charter Implementation Act; or Japan's Act on Protection of Personal Information, our governments are taking the protection of our personal data seriously, and this can only be a good thing. While the USA doesn't have a data privacy law applicable to every American state, each state does have its own law, such as the California Consumer Act (CCPA) – which is important, as California has a larger population and annual GDP than a good number of countries. The USA also has data protection provisions in the Health Insurance Portability and Accountability Act of 1996.
This type of legislation isn't new, and regulations about how we could use personal data predate all of the above examples; however, we just weren't taking it seriously. It wasn't until the consequences and awareness of what was happening were raised that people seem to have woken up and decided that legislation regarding the collection, storage, processing and use of personal data needed to be taken seriously.
What many people don't realise is that this type of legislation only ever and should only ever act as a last line of defence. We should be choosing to do the right thing because it's the right thing, not because we will be penalised if we don’t.
This is where data ethics come in. There will be something of a circular discussion here, but it's important to understand the circle and why it exists.
There is (unfortunately) example after example of why we need data ethics – or as we like to put it, of when good data turns bad, as in the following:
• chatbots which have to be pulled from service after they start making racist comments based on biased data they have picked up;
• blindly following artificial intelligence (AI) decisions without understanding the implications or biases behind them;
• toys collecting data on our children. The consequences of these type of actions dictate the necessity for data ethics.
3 - Other Ideas of Data Value and Monetisation
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Why do we care about the value of data?
Before we get into what value is or should be, we need to understand why it is important. It is, or should be, obvious why it is important for the data folks (especially the data leaders within the organisation) to be able to demonstrate the value of data, because it is intrinsically linked to proving the value of their work. Everyone who is part of an organisation has to be able to prove their value – that's what objectives are all about.
However, when you are responsible for improving how data is used and driving value from it, then, because data underpins almost all business decisions, you will have a significant impact on the future direction of the company, how well it does, adapts and even thrives based on the decisions it makes. When the data leaders or CDO can't demonstrate value, then you won't get the investment needed for the organisation to work with data effectively, efficiently and creatively.
There are very real consequences for an organisation if it does not utilise data to its fullest extent.
• Time and effort are constantly wasted, as the organisation either is overwhelmed by data or struggles to make decisions because it has no idea which data is correct and which is as smelly as yesterday's fish.
• People get stuck doing repetitive tasks that should be easy to automate, and they end up bored and demoralised.
• Problems with data just grow. We have tried this before and nothing works, so why would it work if we tried it again? The problems with data and its use just become more convoluted.
• Often, organisations dig themselves deeper into a hole by making short-term fixes or ‘improvements’ so as to get the answer or result that they need now.
• And the list goes on.
But when the narrative changes and data is seen as a value creator rather than a drain on resources and a bottomless pit, then you can get
• the right kind of resources and effort focused on increasing the use of data to positively change the direction of the company;
• better engagement with people across the organisation as they focus on the interesting things rather than the monotony of manual intervention;
• more meaningful results and better decisions, leading to accelerated growth and creating a virtuous circle;
9 - Halo Data Framework
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
There is no point in all of the theory and explanations that we have covered so far if we can't put them into practical application. We began by saying that we are interested in solving problems and problems aren't solved by inaction. Without a focus on understanding, managing and using your data you will never be able to properly use the power of data to transform your organisation.
It can be really hard to decide what is the right thing to do next, or where to start. This could be a massive understatement in the current data and business environment where things are happening at such a fast pace. Where do you go first, what do you need to do first? Organisations that aren't constantly looking to improve, reinvent themselves or even just look at themselves will only slide backwards against increasingly competitive markets. The technology that we use around data, the science, the art and the processes involved are constantly changing at an astonishing rate. We need to make sure that we’re not just keeping pace with everybody else, but that what we’re doing is using data to keep us at the forefront of where we want to be. The drive for organisations to digitally transform and the radical rethinking on how enterprises want to use data are boosting how much they can change. It is happening at such a dizzying pace. The drive for agile thinking leads us on to more than just an agile project, but an agile way of life. Remember, it isn't just good enough to give people faster horses; in what we’re trying to do we need to really think about the end goal, and to make sure that we achieve it.
At the heart of thinking how your organisation will change itself for the better, or radically rethink its digital and technology capability, you need to think about the data first. Ultimately, the success of any kind of digital transformation demands three very critical elements: people, data and process. In all these things we need to have a high degree of trust so as to be able to move forward.
Data is a business problem, so it's a business problem to solve.
10 - Halo Data Applied Risk Assessment, Regulation, Customer, the Citizen
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Risk frameworks
In our book Data Driven Business Transformation we emphasised the importance of managing risk. Data has to be part of the overall risk process, and you need to think beyond the limits of just regulatory and legislative risk. There are a number of elements to truly understanding data risk.
• Definition of information risk: This should be a clear, concise description that explains the risk so that anybody reading it can understand it. Avoid jargon and acronyms. Would somebody outside your organisation be able to understand the risk as you have defined it?
• Early-warning indicators: What metrics will you put in place to indicate that you’re moving into a danger zone? Assign indicative tolerance levels to your early-warning indicators, and monitor them to ensure that they are doing the job right, and modify them if necessary.
• Causes: Look at both internal and external factors, remembering to include competitive elements, any change in demand force, better use of technology, human risk, changes in both internal and external control and the potential for mismanagement.
• Risk assessment: Safety, performance, finance and reputation (political) overall. Your organisation should have a numbering system for assigning assessed risk levels. Tie in with the corporate risk assessment, using the same system for your data information risk assessment.
• Risk assessment rationale: This is the ‘why’ section. Document the rationale for your assessment of the impact or probability of each risk. This will help to communicate the significance of each risk and place it into context with other organisational risks, so that the organisation can appropriately tailor their assets and resources relative to the likelihood of each risk happening.
• RACI (responsible, accountable, consulted and informed) around risk: For each different risk area you should understand who is responsible versus who is accountable. Whose opinion do you need to ask and who do you just need to keep up to date with what is happening?
• Existing controls, causes and consequences: Look for current controls within the organisation that you can use to monitor each risk. Are other processes already happening that the organisation needs to complete that will impact on the risk? Be honest.
• Improvement actions: This covers two areas: (1) try to stop the risk happening in the first place and (2) try to minimise the impact of the risk if it does happen and there is no way of stopping it.
1 - Who Owns the Definitions and Terms about Data?
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
What was happening before data professionals arrived?
Until recently IT (information technology) has owned the ‘data terminology’. With the rise of the CDO (chief data officer) and organisations wanting to increase the value they derive from their data, we need to think about data in new ways. Data is now a discipline in its own right outside of IT; it is growing up, but isn't yet fully mature. Huge technological steps have been taken, but some fundamental thinking has been omitted. People are trying to organise and govern their data, but they are struggling to get or identify a Return on Investment (RoI) on that activity and to truly release the value of data. We need to find a way to accelerate data science and analytics so that we’re not just looking at their potential but have realised their benefits which will allow organisations to do more with data for less cost.
No doubt we all still read many articles and posts about data, and hear the data community discussing the problems encountered and created by data being a subset of the technology domain within organisations. This is a constant and ongoing discussion that has a commonality across vertical markets and geographies. Even if the data team aren't actually a subset within an organisation's IT department, in many cases that perception exists; and even if it doesn’t, many problems may persist, due to data having previously been a subset of the tech - nology domain. What do we mean by ‘being a subset of technology’? We mean that essentially the CDO reports up to the CIO (chief information officer) or CTO (chief technology officer) or that data was part of the IT or technology teams reporting to the CIO. Even if the reporting lines aren't hierarchical and the CDO sits alongside the CIO/CTO, this sustains the perception of a subset.
We have been calling out this issue for a long time; indeed we wrote about it in our first book, The Chief Data Officer's Playbook, in 2017. At the risk of covering some of that ground again it is worth repeating a few points. If the CTO and the use of technology were going to ‘crack the data problem’ and ‘leverage the power of data’ then surely, after decades of the CTO/CIO role being established in organisations, this would have been achieved by now?
11 - Halo Data and Storytelling
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Data storytelling is the process of translating data analyses into understandable terms in order to influence a business decision or action. Data analysis focuses on creating valuable insights from data to give further context and understanding to an intended audience.
With the rise of digital business and data-driven decision-management, data storytelling has become a skill often associated with data science and business analytics. The idea is to connect the dots between sophisticated data analyses and decision-makers who might not have the skills to interpret the data.
(Alexander S. Gillis and Nicole Laskowski, ‘Data Storytelling’, Techtarget, December 2022, www.techtarget.com/searchcio/definition/data-storytelling)Why storytelling is needed in data
Gillis and Laskowski provide an excellent definition of data storytelling and why it is important. Often we use data to drive decisions, create action or influence opinions and positions. This may be a decision about a performance marketing strategy, about a financial investment or transaction, about how much raw material to buy or how fast to run the engines. As data professionals we all know and hope that decisions are based on data and enabled by data, if not actually data driven. One problem that we have, however, is that not all decision makers are sufficiently data literate to understand or interpret the raw data that is put in front of them, either in the dreaded spreadsheet or on the equally dreaded ‘dashboard’. It is an even greater leap to expect business decision-makers to understand the output from a ML model or indeed the model itself or how it was trained or how it is calibrated. To get around these issues data leaders have increasingly realised that they need to be interpreters or narrators so as to make the data understandable and to enable engagement with insights. Scott Taylor, The Data Whisperer, is one of the leading practitioners and evangelists for data storytelling. It is all about connecting the dots, creating the narrative that brings the data alive to impact on decisions and actions. We often talk about ‘actionable insights’, but if the insights don't create action then they are of little value.
What we are talking about here is using the power of data storytelling to influence operational decisions for the course of the business.
That, however, is only one aspect of the storytelling. Often data leaders need to create the narrative that builds the connection between a data activity and a business outcome.
Introduction
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Ever since the mathematician Clive Humby coined the phrase ‘Data is the new oil’ in 2006, we have all become a bit obsessed with what data is ‘like’ so as to sell its virtues, to convince more people to be data cheerleaders and work with data as an asset. We have seen the phrase used on numerous occasions: world leaders, business leaders and publications worldwide have picked it up and acted as if it was the most important thing that Humby said. Michael Palmer, writing a blog post in November 2006, stated: ‘Data is just like crude. It's valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value’ (https://ana.blogs.com/maestros/2006/11/data_is_the_new.html).
The point to be understood is that data in its raw form doesn't really do very much. Humby's phrase also portrays that data can be used in many different ways and can be turned into a multitude of different and varied products for us to get some value from it. It sits there full of potential, waiting for us to refine, clean, link, structure and analyse it; basically to unlock it so that we can turn it into a model for predicting when extreme weather will affect us, or how to cope with spikes in demand in our medical services, or how to predict customer or citizen behaviour.
The phrase also highlights that oil and data have some attributes in common. There may be some value in taking our understanding of how we use oil and applying it to data. We can look at how oil as an asset is treated and draw useful parallels for how we can treat data. From understanding what stages oil goes through and how it is treated, we can move on to thinking about how the same principle can be applied to data and the processes data needs to go through in order to be useful; in other words, the refining process, understanding what it is going to be used for, the preparation phase and so on. The words also bring to mind the engineering and the energy required to convert oil into something useful (think about the complexity and scale of an oil refinery).
6 - Getting to Know Halo Data
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Bohr's theory of the atom
In Chapter 5 we looked at going ‘beyond’ data to more comprehensive data that extends the boundaries of conventional laws and thinking. This data creates the Halo around the original data point.
There is more to learn and understand about Halo data from the discipline of physics. In Niels Bohr's presentation of the model of the atom in 1913, the most stable, lowest energy level is found in the innermost orbit. This first orbital forms a shell around the nucleus and is assigned a principal quantum number (n) of n=1. So the metadata is the most stable. In data science terms it has the lowest potential energy to release but has attained the highest order n=1, therefore it has the greatest realised value to the business, it occupies the innermost orbit and forms a shell around the central data point.
Additional orbital shells are assigned values n=2, n=3, n=4, etc.
As electrons move further away from the nucleus, they have potential energy and become less stable. So, with our Halo data, as we move further away from the data point and more into assumption and unverified data, the data becomes less stable but the ‘potential energy’ of that data increases. For example, the political leanings of Peter may be unverified, a matter of assumption rather than fact, and so that piece of data may sit out in the n=6 orbit – but it may have huge potential energy if we can verify it as a fact. But as a ‘fact’ at this stage it is very unstable, very unassured, the confidence level is low. Our Halo data fits Bohr's model of the atom.
To continue with Bohr's model, atoms with electrons in their lowest energy orbits are in a ‘ground’ state, and those with electrons at higher energy orbits are in an ‘excited’ state. Quantum mechanics describes the movement of electrons from an outer orbit to an inner orbit and energy being released. So, data points (remember our Peter example) with simple metadata associated with them are in a ‘ground state’. Data points with a Halo of data are in an ‘excited state’. As data professionals, as data scientists, we want data in an excited state: this is where the ‘potential’ exists.
4 - Value from a Different Source
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
A few years ago we had the pleasure of meeting Catherine Mandungu, who is an expert in revenue operations and spends her time helping businesses to think about their revenue in a different way. She approaches data and its value from that perspective. We invited her to share her thinking with you, as it demonstrates how the value of data is used from a non-data professional point of view.
Your morning alarm goes off. You stretch a little, have a glass of water and reach out to your phone. Some time ago you bought an app, SleepScore, which tracks your sleep activity. Looks good, you had a peaceful night. You also check your Instagram and Facebook to check what you have missed while asleep. You like a few healthy recipe pages. You come across an advertisement about a yoga app. Perfect! You’re all about health and fitness. Maybe you’ll subscribe for a trial. Now, time to get ready for a working day.
You manage a commercial team at an analytics company for ecommerce businesses and have sales targets to hit, so before your online meeting with the team you check the performance stats. You need to figure out where productivity gains could be made. Later in the day, you speak to the Head of Product. You have been collecting data about deals that were lost due to product (lack of) features, and the business needs to deliberate on how to improve the product so more deals can be closed.
After a long working day, you pop down to the grocery shop to get something for dinner. At the checkout it asks you if you have a loyalty card. Yes indeed. Register the goods you bought; you might get a discount. Once dinner is out of the way, it's time to unwind. You decide to watch a movie on Netflix. Immediately it recommends some movies and series you might like. Great! It was going to be too much effort to look for something to watch. Finally, to bed, but you sometimes have trouble sleeping, so you got a sleep and meditation app called Calm to ease you into it. You’re in bed and ready. Press play.
We are living in a digital world which has fuelled a data economy.
5 - Hello Halo Data
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
A reminder about metadata
Let's remind ourselves of a question we posed in Chapter 2, ‘What is Metadata?’ Are we stuck in 1967? Are we stuck in the Summer of Love, Sergeant Pepper's Lonely Hearts Club Band, mini-skirts and the Ford Mustang (although we are both rather partial to the newer version of this car)? Perhaps more pertinent, are we currently stuck with an understanding and usage of metadata set by the initial thinking of 1967? Haven't we moved forward in our thinking? In Chapter 2 we also briefly explored the definition of ‘meta’ (‘after’, ‘beyond’, ‘more compre hensive’) and aligned it with metaphysics: beyond the physical laws of physics. So again we pose the question: shouldn't we be taking metadata beyond the accepted laws and definitions cast in the mould of 1967?
Again looking back to Chapter 3, we had two definitions of ontology and we discussed the second in some detail: ‘a set of concepts and categories in a subject area or domain that shows their properties and the relations between them’. We said that we would return to the first definition: ‘the branch of metaphysics dealing with the nature of being’. This first definition makes the link between ontology and metaphysics and nature of the ‘being’ of data. This may help us to understand the inherent value in data and how it changes. It may also help us to better understand the very nature of data and explore those concepts of the ‘single version of the truth’ or the ‘golden source’.
Data and metaphysics
Let's move the conversation along and step into the world of metaphysics. Consider the ‘old school’ view of the atom, its orbiting electrons and nucleus. This is synonymous with our current view of metadata: a few pieces of information (electrons) circulating around the data point (the nucleus), in very structured and defined paths. The electrons (the metadata) occupy very defined paths and orbits, and perhaps we can see these as the types of metadata: structural, descriptive and administrative, or perhaps even as the W7 framework. The orbits occupied by the electrons (metadata) are also very structured, so perhaps they are the ontology of the metadata. Applying these concepts from the old-school view of the atom to metadata demonstrates that our current view of metadata is very rigid and fixed.
7 - Early Examples of Halo Data Approaches
- Caroline Carruthers, Peter Jackson
-
- Chapter
- Export citation
-
Summary
Why hasn't the paradigm shift happened before?
Before we look at a couple of examples, let's recap our paradigm shift on metadata. We now see metadata as a Halo around the CDE. The CDE is the nucleus; the quantum element attached to it, defines its purpose and the metadata in the Halo either adds value to the CDE or has the potential to add value. The data in the metadata Halo is assigned four values:
n – the distance of the data from the nucleus
v – the value that the metadata provides to CDE
p – the potential that the metadata has to provide value to the CDE
c – the level of confidence given to the metadata.
We are pushing the concept of metadata beyond the limits of the 1960s definition to demonstrate that metadata in its own right has the potential to deliver value to the business.
This all might seem obvious and straightforward: we’ve always had metadata and we’ve always had some concept that data provides value to the business. But have we really? How much value do organisations really put on the metadata that they already have, let alone that they might collect and use? Across the wider business it is likely very little; and also likely that they don't leverage the value locked up in the metadata.
We are only now starting to talk about data having value and demonstrating that value; the idea that the metadata could have a demonstrated value as well is lagging behind in our thinking.
There are a number of reasons why organisations are still stuck in the 1960s with the understanding about metadata.
1 There is a lack of creative thinking about metadata as a source of potential value. This was discussed in Chapter 2. Metadata has been viewed an unexciting chore that must be dealt with, with no obvious RoI but plenty of overheads, looked after by the technology teams.
2 Organisations have not been accustomed to searching for and ingesting new and different types of data that may or may not deliver value to their existing datasets.
3 Linked to the above, but a reason in its own right, is a fear of failure. Organisations aren't prepared to invest in a ‘data project’ if there is no clear RoI – and even more so if they can't show that it has already effectively been done somewhere else.