More and more organisations are turning to big data to inform their decision-making. However, they are finding that all is not well when they try to use big data in this way and as a result frequently get disappointing results and end up abandoning the exercise or worse end up making bad decisions as a result of big data analytics.
A new study by researcher Maryam Ghasemaghaei and Goran Calic from De Groot Business School at McMaster University looking at why organisations often end up abandoning the use of Big Data in their decision making, makes for some useful and interesting reading.
In this podcast I interview Dr. Maryam Ghasemaghaei about her research and findings that will be useful to any organisation using or thinking of using big data for decision-making
Dr. Maryam Ghasemaghaei is an Assistant Professor Information Systems at De Groot Business School at McMaster University.
Dr. Goran Calic is Assistant professor of Strategic Management at De Groot Business School at McMaster University.
99% of everything you are trying to do...
...has already been done by someone else, somewhere - and meticulously researched.
Get the latest research briefings, infographics and more from The Oxford Review - Free.
David: 00:10 Okay. Hi, and welcome to another podcast. Today we’ve got Maryam Ghasemaghaei from DeGroote University at McMaster University. So DeGroote Business School at McMaster University. And she, together with a colleague of hers, has authored a very interesting paper about big data that’s entitled Can Big Data Improve Firm Decision Quality. The Role of Data Quality and Data Diagnosticity. So this is in the Journal of Decision Support Systems, and it’s a really interesting paper and we’re going to be talking about it today.
David: 00:59 So Maryam, can you just take a couple of minutes to introduce yourselves. Give the listeners a little bit of background about your personal journey so far, and a flavor of your academic history. Kind of how you got here in terms of your research interests?
Maryam: 01:13 Yes, sure. So first off all, thank you very much for having me today. So as you said, my name is Maryam Ghasemghaei. And I am an assistant professor at DeGroote School of Business at McMaster University in Canada. So I actually joined McMaster in 2016 as a faculty member. But, I did my PhD, also at McMaster in Information Systems. So it’s usually rare to actually hire internally when you get your PhD at any university and they don’t usually hire you to become as a faculty member. But it was a rare choice that, because I had some certifications from different university in big data and data analytics. And they actually wanted to hire somebody to teach these courses and research this area.
Maryam: 02:02 So they were actually really, really interested in my CV and they offer me a position and also my husband also got an offer. But he is an electrical engineer again at McMaster University. So we were both hired in 2016 and this is why they say, “Okay that’s good. So the life will be much easier because we both work at the same university.” So this is why we decided to actually stay at McMaster and continue our academic career here.
Maryam: 02:35 But I didn’t study Information Systems for my Bachelor and my Master degree. I studied Industrial Management for my bachelor and then I was interested more in Marketing then I studied Marketing for my Master degree and then I wanted to actually apply for PhD. I was reading a lot of papers and I found out I’m really, really interested in understanding the use of the technology by individuals and users and this is why I just applied for Information Systems.
Maryam: 03:09 In the first few years of my PhD, most of my research were a lot related to human computer interaction. And for example I had a lot of projects as I was just designing new technology for people. For example, for older adults, younger adults and try to understand which design is more effective and more useful for a specific type of demographic of the people. [inaudible 00:03:41]
David: 03:42 Oops. Seems I’ve lost you somewhere.
Maryam: 03:45 [inaudible 00:03:45] in that area, sure. So while I was working in that area, I always wanted to work on several projects. Not only my thesis, but several other projects simultaneously, because I always loved to do research.
Maryam: 04:01 So when I was doing more research I found that actually there is an interesting area in the area about Big Data and Data Analytics and we don’t have much empirical studies in that area. But I found a lot of articles, but they are all conceptual articles. We don’t in data culture we don’t have any models nothing. There were all just on conceptual papers and they were saying that, “You know what? Big data is really, really important and all the firms want to actually process big data to improve their firm outcomes.” However, when I was reading all these conceptual papers, I found that most of them said that, firms, a lot of them invested in processing big data but most of them actually failed. They couldn’t successfully improve the firm outcomes by utilizing big data and data analytics.
Maryam: 04:56 So this is why I became more and more interested and started to have several projects even though I was a PhD student and I haven’t started my academic career as a faculty member. So this is why I have started in this area a few years ago and by now I focus on different, different areas. Because I always wanted to understand, what are some of the more successful factors that actually help firms to enable them to improve their firm outcomes when they are implementing and processing big data. So this is when I started to actually focus on these types of projects.
David: 05:39 Right. Okay. So what kinds of things are organizations using big data for to make decisions about?
Maryam: 05:43 You know what? Right now they are using, for example and. So let me first introduce when I define big data actually. When I talk about big data, I mean that. So there are three main characteristics that define big data. So we need to talk about big data it’s volume, velocity and variety. However, these are the main three big data characteristics. So when I am talking about volume, it’s the size of the data. When we talk about variety, it’s parts of the data that organizations are processing.
Maryam: 06:17 Like several years ago, organizations were mostly focused on processing structure data. Like numbers for example. Because we didn’t have advanced technologies, right? But nowadays, organizations are trying to process unstructured data. For example, customers put a lot of comments on social media. Like for example, they put the comments in Twitter, in Facebook, everywhere, right? They just put comments. So if there is a company and they have a specific quota, they just want first to know about, what is actually think of people about their product. Maybe satisfy, maybe it is negative, maybe it is positive, maybe it is neutral.
Maryam: 07:02 So by analyzing all these data, that they could get from different sources, they would actually understand how they could improve their performance for example. How they could become more agile. How they could improve their decision making. So, for example if they want to have a new product in a company, how they can process these data by processing large size of data, different types of data and the state of the term is velocity. Which is about the speed of processing the data. So how quickly. Because they need to be so quick, right? In analyzing the data and making a decision. If not, any competitions would do that. So this is why they are trying to use some advanced technologies to process large, big data.
Maryam: 07:50 And recently I found that there are in one article that there is actually people always come up with different needs for big data adding more characteristics. So I heard that they are identifying 42 Vs right now. But actually the 3 ones [crosstalk 00:08:07]. Imagine also. The three Vs. One velocity and variety is the most important one. But, they are trying to process [inaudible 00:08:18].
David: 08:17 Yes.
Maryam: 08:17 Sometimes one of them is a disadvantage.
David: 08:21 So what you are saying is that data diagnosticity should be based on these three characteristics. Volume, variety and velocity of data.
Maryam: 08:32 Exactly. Exactly! So organizations are processing big data to enhance their decision making and data diagnosticity which is. The meaning of data diagnosticity is try to deeply understand what’s going on in the data. Because, firms are having huge amounts of data that they can just get it from everywhere, right? But they just need to get insight to understand for example, what has happened in the past. What is going on right now and what they can do in the future. And this is why big data could help them, but as I mention in this book one of the most important variables that they need to consider is the quality of the data.
David: 09:16 Yes. Yes. You mentioned just before we get onto the quality of the thing. You mentioned in the paper about confidence in the data.
Maryam: 09:26 Yeah. Yeah.
David: 09:27 Do you just want to explain what you mean by that and why confidence is important?
Maryam: 09:33 Confidence is really, really important. Because you know, and one of the most important things that leads to have confidence in the data and in the results [inaudible 00:09:42] is this [inaudible 00:09:46] have confidence of the results that they get from processing big data. They wouldn’t actually be able to protect the decision making quality. So this is why they are saying that there are different variables that need to increase the confidence of the organizations in the data and the results that they got from the data in order to be able to enhance their outcomes.
David: 10:10 Okay. And so one of the findings seems to be that actually the level of confidence that people within the organization has about the data affects, their decision making capability using the big data.
Maryam: 10:28 Yeah, yeah, exactly. So it depends. So I found actually there are several interesting results. But one of them is that, which data doesn’t impact equally on different categories of the data quality. And then I can just also go through the types of the data quality that we have. And then it will impact on the data diagnosticity and also different categories of the data quality can also differently impact on data diagnosticity. So, we wanted to go through.
David: 11:00 Yes please.
Maryam: 11:01 More data of it.
David: 11:01 That will be useful.
Maryam: 11:05 So in one of the I have done a lot of research in the area of data analytics, big data and it’s impact on firms outcomes. And I was able like I publish a lot of several of them and there are a lot of papers that right now are under review. One of the papers that I publish in JDSS in 2017, which is one of the popular journals in Information System. Is that I try to define data analytics capability and competency. So that one is based on different variables that enhance decision making following the firms. One of the important variables that I saw that it enhance and form data analytics competency in the organizations is data quality.
Maryam: 11:52 And also, I worked on a paper that was a literature review paper that I analyzed more than 500 papers that focused on information quality. To understand what is information quality, what is data quality. When we talked about data quality, what does that even mean? I find out that based on all these papers that I have read, I find that there is a very popular framework that is developed in 1996 by Ryan Anis Shrung. That they categorize data quality into four category. That one of them is intrinsic data quality, the other one is contextual data quality, the third one is representation of data quality, and the fourth one is accessibility data quality.
Maryam: 12:36 So, I was thinking that okay, that’s a [inaudible 00:12:45] new introspection to see that data quality has a rule frame work to enhance the decision making quality based on processing big data. I was thinking that, okay, maybe just considering and measuring data quality as one variable is not enough. So this is why I was trying to find a really good framework and a popular framework that could actually cover this concept. So, intrinsic data quality is really talk about actually the real value of the data. Contextual data quality is mostly to say that if the data is actually make sense in the context that we want to use. Representational data quality is that when we have a data and when we want to analyze it and when we are looking at the results, it should be actually easy to understand. If we just have a lot of data and that we don’t even understand what’s going on in the data that wouldn’t help. So that is also one category of data quality.
Maryam: 13:59 So accessibility data quality is that when we want to have access for data, is it easy to have access to that or maybe is not easy to have access to that? So this is why I want to try to understand when firms are processing big data, how that big data that is said that it forms by data value, data variety, data volume, data variety and data velocity. How it would impact on data quality categories, because maybe it doesn’t impact on quality. And how data quality impacts on data diagnosticity, which we’re generating inside and would enhance the quality of the firms decision.
Maryam: 14:40 And so I found really interesting results. Actually some of them were so surprising for me. So do you want me to talk about the results?
David: 14:47 Yes please! That would be excellent.
Maryam: 14:53 Excellent. So I actually was expecting that. So the opposite arguments and discussions in the paper about the impact of big data and data quality. Because whenever I was going through different papers, there were some papers, I say, “Oh you know what, firms are processing big data, it could decrease the quality of the data.” And this is why a lot of firms are scared. Because they would say that, “Okay. When we get data from everywhere. So it would reduce the quality of the data and cannot enhance the quality of our decision.” So, this is why I try to have different arguments in the paper that, how actually big data could impact on different categories. It’s not only one category.
Maryam: 15:40 The interesting finding that I found based on the results I had got, is that actually the results is based on the 130 participants that I had. And they [inaudible 00:15:46]. Just found them. And based on the results that I got from these 130 participants, I found that big data actually reduces intrinsic data quality. However, it enhances and have a positive impact on the other type of the data quality categories. And it was something so surprising for me because I thought that actually big data could equally or could positively impact on all or maybe negatively impact on all. But I found that firms are processing big data. It reduces the intrinsic data quality, which is the real value, or the accuracy of the data.
Maryam: 16:40 I was trying to find a justification for that. But I was thinking that maybe the reason could be that firms are trying to get data from everywhere, right? So for example from social media, from blogs also. Everywhere they could. Also now they are using advanced technology. For example a lot of firms have started to use heavy clusters. Like Google, like Facebook. So they are now able to sort more data and process more data. But they are getting it from everywhere. So this is why maybe when we get. So it’s got to be like garbage. You don’t want to have garbage in, garbage out, right?
David: 17:20 Yes.
Maryam: 17:21 They get data from everywhere. So maybe this is why it has a negative impact on intrinsic data quality. But for the other categories. So for contextual or for example for representational. Nowadays organizations are using a lot of, for example representational. They are using advanced technology for visualization for example. And then they could really have more data. They could just have really nice graphs and until you know? And pie charts and nicely see what’s going on in a data. So it helps actually when they process big data. They could nicely see like through the graphs and pie charts that what’s going on in the data.
Maryam: 17:59 And also through. So they have more accessibility to data. The other interesting result I found was that the direct impact of this data on decision making quality was not significant. So this shows that, and this means that firms need to make sure that hey have high quality data which enhance the insight they could generate in order and to be able to enhance their decision making quality. If not, may not be able to do that, and I think this is why the failure rates of processing big data is really high, because you know, it’s a hot topic.
Maryam: 18:39 So I was talking to one of our companies here in Canada in Ontario in Canada. And they were saying, “Oh yeah it’s a hot topic.It’s a hype. Everybody wants to process big data. Everybody wants to have advanced technologies. To try to just get all these data from different sources and just to analyze their [inaudible 00:19:04] it [inaudible 00:19:04].” It uses really good. But if we do not spend a lot of time on the quality, to enhance the quality of the data they may not actually be able to improve their outcomes. And here in this paper, I specifically focus on decision making quality.
David: 19:21 Yeah.
Maryam: 19:22 Says yeah. And one more interesting result was that for data diagnosticity, all the data quality categories like the intrinsic category, contextual and representational positively impacted on increasing the insight we obtained from processing big data. However, accessibility did get impact. And maybe it’s because nowadays organizations can easily get their data from everywhere, right? So it does not technically come in hard. But it also shows that the quality of the data enables firms to actually increase their insight that they generate from the data they have.
David: 20:03 Yes. Do you get the sense that people within organizations when they are using big data can have a good understanding about the types of data that they’re using?
Maryam: 20:15 Yes. Exactly, exactly. That’s a really, really good question. Sometimes they do not make that. They do not even understand the data. They just want to get the data from everywhere that is possible.
David: 20:26 Yes.
Maryam: 20:28 Sometimes it’s not even a relevant data, you know? So you could have a huge amount of data, but it’s not even relevant to what you want to do. So what’s the point of getting it? And what’s the point of having spending a lot of money on that and not being able to enhance the outcomes? So yeah. Interestingly, I talk to a lot of data analysts and they said that in order to increase the insight that they are generating from data, they need to spend about 88 or 85% of their time to just clean the data. So this is a really, really important step that a lot of times companies just ignore or overlook that. And they just want to get a result so. But again if you don’t clean that, if you have low quality data, so you can’t get exactly what you expect.
David: 21:24 Yeah. It sounds like this is a call for organizations to learn how to diagnose the types of data they have got and understand the quality of the data and what it can actually be used for before rushing off and using some of these tools.
Maryam: 21:39 Exactly. Exactly. And this is why hiring with the. I know in Ontario a lot of companies have problems with actually hiring people that, who knows very well about processing. Who have that technical knowledge and not just that technical knowledge, but also from both [inaudible 00:22:03].
David: 22:02 Woops.
Maryam: 22:03 [inaudible 00:22:03] to analyze and process the data have dual skills. Not only technical, but also understand what’s going on in the data.
David: 22:11 Yes, yeah. And when you were doing the study did you find many organization that actually have that capability?
Maryam: 22:20 Yes. So there were actually some companies that had that capability. But still those companies that do not consider the quality of the data is more than the ones that actually considered the quality of their data perfectly. So this is why maybe again like that 27% of the firms that could not successfully enhance their outcomes is one of the reasons could be because of the data quality. Still there are more firms that actually fail when the process the big data. Yeah, because the success rate is really low in this area.
David: 23:00 Yeah, that’s really useful.
Maryam: 23:01 Yeah, they just want to have that. You know it’s a hot topic so let’s have it you know.
David: 23:04 Yes. And if there was one thing that organization and consultants could take away from this study, what would it be from your position?
Maryam: 23:20 Okay, so I think that organization need to. They should not rush when they want to analyze the data. They need to make sure that they have the clean data in every processes before they make the conclusion. And they also need to make sure that they hire some employees that they have that knowledge. Or maybe they can just train them. Maybe they just don’t want to hire new employees, they can just train them. And they can use different methods to just increase the quality of the data.
Maryam: 23:53 And you can also provide some new policies in the firm that all the employees would be able to make sure that they all are cleaning the data. Make sure the quality of the data is high before they start or continue to make the conclusion. Because the conclusion would be wrong. Again, it’s going to be garbage in, garbage out. You don’t have high quality, you wouldn’t get the results that they are expecting.
David: 24:16 Yeah and in fact using big data is like any research really. You’ve got to make sure that you have quality data in order to get a quality outcome whatever that outcome. And understand what’s really going on and understand the patterns that are goin on and.
Maryam: 24:28 Exactly. Exactly.
David: 24:28 Yeah I was going to say, maybe if they actually saw the use of big data as a research project that requires good evidence, then things may start to improve.
Maryam: 24:43 Exactly. Like researchers look all other data. But with these new characteristics that we were talking about. Like the volume, variety, and velocity of the data. But velocity’s one of the most important characteristics at organizations want to. When they get the data just [inaudible 00:25:01] the results. I understand that they should make the decisions quickly. If not the competitors will do that part. Without having high quality enough to do this the huge difference from big data and any other data because of the characteristics that it has. So because of those three main characteristics. But yeah, they need to make sure that how big data impacts differently on different aspects of the data and having different strategy for that.
David: 25:33 Yes, it’s almost their need for speed is undermining their interest in the quality of the data that’s going in.
Maryam: 25:40 Yes, exactly, exactly.
David: 25:44 That’s very useful Maryam. I really appreciate that. So what are you working on at the moment?
Maryam: 25:50 I am working on a lot of other projects in the area of big data and data analytics. And I’m working on both the bright and dark side of the big data. Because you know everybody is think that big data is really good. Let’s have it. Improve our outcomes. But in my new projects, I am actually saying that, “Big data is not always good. And it also has some negative impacts on the firm outcomes.” So for example in a recent projects that I am working with one of my colleagues, he is in the California State University, Fullerton, in the US. And we are working on the dark side. We are thinking that, “Like which data?” So it’s still in the initial stage that, which data may actually enhance knowledge hiding in the organizations.
Maryam: 26:41 Because more people are processing huge amount of data. They do not even have time to share the knowledge with everybody else in the organization. It also actually enhance the workers’ stress in the among the employees. They would have more stress when they know that they need to process huge amount of data really quickly in real time and also processing different types of data. And also I am trying to understand the role of autonomy in the organizations. How it will be impacted by big data. So in most of my projects, I am focusing on the negative side of that. How big data would negatively impact other [inaudible 00:27:24]. As much still there are other projects that. So I am focusing on both side of that. So say that, “You know it’s not always good. There are some funny parts. But you know what? It’s not always bad or it’s not always good.”
David: 27:36 Yes and understanding what it is that turns projects like big data projects into something that’s useful or something that’s actually negative is going to be very, very useful.
Maryam: 27:47 Yes. Exactly, exactly. Because yeah. Again everybody just wants to have it because it’s a hot topic. But it’s not always a good thing.
David: 27:54 Yes. That’s really useful Maryam. I really appreciate that. So what’s the best way for people to be able to follow you in your work?
Maryam: 28:02 So for my work [inaudible 00:28:06] always have all [inaudible 00:28:10] but at DeGroote website. In my McMaster profile.
David: 28:14 At DeGroote.
Maryam: 28:14 And also they can contact me through my email. So my email address is there in my McMaster profile. And also through LinkedIn is also a good way to be contacted. But the list of my publications is at DeGroote School of Business website.
David: 28:32 Brilliant. Okay. Well I’ll put a link in the show notes to your page at DeGroote.
Maryam: 28:39 Okay. Excellent. Thank you very much David.
David: 28:41 It’s been an absolute pleasure Maryam. I have really enjoyed the paper and I have really enjoyed talking to you as well.
Maryam: 28:47 Thank you very much. I also enjoyed it very much. Thank you so much.
David: 28:50 You take care and thank you very much.
Maryam: 28:51 You too. Bye bye.
David: 28:52 Cheers bye.
Be impressively well informed
Get the very latest research intelligence briefings, video research briefings, infographics and more sent direct to you as they are published
Be the most impressively well-informed and up-to-date person around...