9 Understanding Data
Data are all kinds of values that reflect facts or events that occur in the context of the activity of the firm and that allow one to recreate or know a reality, either at the micro level of the organization (e.g, the firm, a department), or at the macro level (e.g., ecosystem, market). More specifically, the ISO/IEC-2382-2015 standard, focused on the information technology vocabulary, defines data as a “re-interpretable representation of information in a formalized manner suitable for communication, interpretation, or processing”. Data is, therefore, a symbolic representation (numerical, alphabetic, algorithmic, spatial) of a quantitative or qualitative informational attribute or variable that represents an empirical fact, event, or entity, which can be reinterpreted. This means that the context surrounding the data is key for it to acquire its full meaning. Furthermore, data can come from different sources and take different forms. For example, data can be an image from a traffic camera, a message on social media, a voice in a conversation, the temperature of a room, the number of items sold by a store, or the exact time an order was delivered. In addition, data must be able to be communicated and processed, flowing from its source to a processing system. Eventually, all data has a life cycle within organizations that begins with collection, goes through storage and processing, and ends with the dissemination of the data converted into valuable information to users (Fig. 9.1).
The term data warehousing refers to back-end management, or those tasks, processes, and capabilities that support data management, including the tasks that need to be performed to prepare the data for analysis and which a user cannot access. Meanwhile, business intelligence refers to front-end (user-oriented) applications that provide data for analysis and decision-making support. The term analytics is concerned with advanced business intelligence methods that include quantitative data analysis (e.g., statistical analysis, predictive models, etc.) (Collier, 2012). As can be deduced from the data life cycle, each of the stages involved will require appropriate management practices, skills, competencies, and technologies to carry out efficient data management and meet the firm’s performance goals.
9.1 Sources of Data
Tourism firms use different types of data that are differentiated by the way in which they are created and the media on which they reside. For example, social media have become one of the main platforms that contribute user-generated data to tourism firms due to the wide support they have received from tourists and society in general. Social media apps are convenient, easy to use, and consumers have them installed on their mobile devices. This greatly facilitates interactivity with tourists, and that many want to share their lives on the internet.
But social media apps are not the only data sources that allow tourism firms to access new data. As more and more people share their lives online and interact with smart devices and sensors of all kinds, large-scale structured and unstructured data is emerging, ushering in the era of Big Data. Moreover, due to the strong development of the Internet of Things (IoT), various sensors are being developed and used to track the movements of tourists and environmental conditions, providing vast amounts of spatio-temporal data (e.g., GPS data, mobile roaming data, Bluetooth data, etc.). In addition to this so-called unstructured data generated by devices, there is data from tourism operations, such as web searches, visits to web pages, online reservations, purchases, etc. (J. Li et al., 2018). These also represent large amounts of data of a transactional nature that have a high value for tourism firms to understand consumer behavior and improve their value offer to customers. Table 9.1 summarizes the main types of data to which tourism firms usually have access and which are generally divided into two categories (structured and unstructured data) and six subcategories (Lv et al., 2021).
Type | Source |
---|---|
Structured data | Professional databases |
Unstructured data | User-generated content |
9.1.1 Structured data
Structured data is data that conforms to a standardized data model, has a well-defined structure and order, and is easily accessible to both humans and machines. At the dawn of the information age, structured data was the main source of information used by tourism firms, governments, and researchers – there wasn’t much more. Nowadays, the main sources of structured data come from business databases, government databases, and industry databases.
Firms usually create a business database to record data related to their operations, transactions, and those events that are important to improve business management, including customer and user data, competitor data, or financial accounting data. Traditionally, CRM tools have been one of the most important sources of data to examine the customer profile in tourism firms, accompanied by manufacturing, supply chain, and accounting information systems.
Government databases are created and maintained by national and regional governments and usually contain data on the economy and the productive sectors of the economy. Examples of this type of databases are census data, data on national and international visitors to a destination, hotel occupancy, etc. In the case of tourism, some of the most used database include those related to the licenses of vehicles registered for tourist transport, the traffic flow of motorways, airports and all types of transport, employment in the hotel industry, etc. These data have traditionally been used to forecast tourist demand at the level of tourist destinations, as well as to know the flows of tourists, or the environmental impact of tourism. Of course, there are many other sources of government databases that can be used, such as the energy consumption of the tourism branches, the flows of cross-border travelers, employment, income per tourist, the number and occupancy of hotels, etc. However, a typical problem with these data sources is that it is government owned information that is not always available or up to date.
Sectoral databases are those created and maintained by professional organizations belonging to the tourism sector. A good example is the tourism statistics database of the United Nations World Tourism Organization (UNWTO), which includes data such as the flow of tourists between markets of origin and destination, or the demand and capacity of tourist accommodation. Since this data is already structured, there is no need to carry out a data preparation process beforehand (e.g., data cleaning), so the data can be extracted and processed right away. Despite this great advantage, structured data does not usually adapt flexibly to the objectives pursued by tourism firms, so its effectiveness is generally very limited. This makes it common practice to combine structured data with unstructured data to take advantage of it.
9.1.2 Non-structured data
Rapid advances in internet technologies in recent years have led to the creation of unstructured data on a massive scale. Today’s users can easily post their opinions about their last trip, share comments about what they think of this or that hotel through social media and digital platforms, and express their likes or dislikes about any product or service they have consumed. This is why internet giants like Facebook, Twitter, or Tripadvisor have become authentic hubs of user-generated content (UGC) in the form of reviews, videos, geolocated photos, etc.
All this UGC, together with the data generated through mobile devices and the data obtained through the digital footprint left by users on the web, has enormous value for firms that struggle to know their customers and users at an individual level. Let’s take a closer look at each of these sources below.
9.1.3 User-generated content (UGC)
UGC typically includes two types of main data: 1) textual data related to the travel experience, such as those shared by tourists on social media and blogs; and 2) online photos and videos, such as those shared on Instagram and Flickr. Social networks, blogs, and online booking and review platforms (e.g., Tripadvisor, Booking.com, Airbnb, Ctrip, etc.) are powerful platforms through which tourists share a wide variety of information related to their tourist experiences, express their satisfaction or dissatisfaction with the tourist products and services consumed, or simply recommend a certain place or provider to other tourists. This type of data can be used by tourism firms as a primary source of information to analyze the movements and travel patterns of tourists. Nonetheless, they generally lack sufficiently detailed contextual information.
Review data is mainly used to assess tourists’ satisfaction with the tourism products and services they consume (e.g., hotels, rural accommodation, restaurants, tourist attractions, etc.), as well as to inform changes that improve the experience of tourists. This data comes primarily from text data that tourists post on social media, as well as online reviews on travel blogs, accommodation booking sites, and travel websites. The volume of review data can range from several tens to hundreds of thousands depending on the location or attraction in question.
Blog data is used to analyze the sentiments of tourists and keep track of the places that are most popular. This data is also used to find out the recommendations made by visitors to a destination. Twitter and Sina Weibo are some of the main social blog platforms and blog data sources. By processing blog data, tourism firms can obtain valuable information about the temporal and spatial distribution of tourists, their feelings, and the places they recommend going, or what to do when traveling to a particular tourist destination.
Online photo and video data shared by users on social media and online platforms (e.g., YouTube, Flickr, Pinterest, Instagram, Tripadvisor) also contain very rich and varied information, such as information on the location, time of the visit, or even personal notes of the tourists. This data also provides information on travel recommendations to certain places, travel routes, and user preferences for certain accommodations or tourist attractions. In recent times, this type of unstructured data has increasingly attracted the attention of tourism firms, destinations, and researchers, as it offers a new perspective to study the behavior of tourists and the recommendations they make, which can have an impact on value offerings aimed at the market, the design of tourist routes, and marketing.
Some recent studies have used geotagged data in combination with other data sets to compare visitor movements. For example, after comparing Twitter geotagged data on tourist flows between countries with official tourism statistics, similarities have been found between both data sources. Geotagged data allows for detailed street-level accuracy, especially when compared to other traditionally used methods such as surveys or check-ins at accommodation. However, while this type of data can help tourism firms better understand the relationships between people and places, and provides insight into the context of visitors, it also raises issues related to the reliability of user-generated content, especially in the case of false reviews, or photos and videos that have been manipulated.
9.1.4 Device-generated data (DGD)
Vast amounts of data are also generated today from devices integrated with mobile signals (e.g., GPS positioners, Wi-Fi connections, etc.). Unlike traditional static data, such as that obtained through surveys and panels, device data is used to track the movements of tourists in space, thus allowing tourism firms to understand the behavior of tourists more accurately and efficiently in real time.
Mobile phone signaling data is increasingly being used to track the behavior of individual tourists. This type of data provides temporal and spatial information associated with certain data of mobile phone users that makes it possible to track the location of users and identify patterns of movement within and between tourist places (Xu et al., 2020). The signaling data usually includes the number of a station to which the mobile phone is connected, the time in which the connection takes place, and the identification number of the encrypted mobile phone. This can be useful to analyze the loyalty of tourists to a place, the temporal and spatial patterns of visitor movements, and the distances traveled by tourists. It can also be useful for marketing, as current technology allows mobile phones to be positioned, the origin of tourists to be tracked, and the temporal and spatial changes of visitors to be captured in real time both indoors and outdoors. However, there are several issues related to this data: it is difficult to access due to data privacy, and it often lacks accurate information on the location of users and their socioeconomic attributes, thus reducing its value for tourism firms. Additionally, on many occasions, this data does not distinguish between leisure tourists and business travelers, or between tourists and residents.
Data from GPS (satellite navigation system that provides geographic positioning and time information to a receiver) and GPS-enabled services is another important source of unstructured data. Tourism firms can use this data primarily to examine the spatio-temporal behavior of visitors in places and events. For example, GPS data from taxis has been used to track the movements of tourists in a tourist destination.
GPS data comes mainly from two sources: 1) GPS loggers carried by volunteers or installed in public devices used by tourists in a place (e.g., GPS data from public bikes, or scooters shared on city streets); and 2) GPS-enabled mobile apps installed on users’ own devices. The former can be used in ad hoc analysis involving a small number of loggers or volunteers, like users who rent an electric scooter, bicycle, or motorcycle on the streets. The latter can provide information on the trajectory of tourists, travel routes, and their behavior in a destination, which is useful for modeling and predicting the behavior of tourists in tourist destinations (Z. Chen et al., 2021). In addition, this data is more flexible and cheaper to obtain than the first type, making it the preferred GPS data source. Unlike mobile phone positioning data, GPS data provides more precise information on tourist behavior in time and space, since it is generated continuously and without intervals (Shoval et al., 2014). It is also not affected by weather conditions. However, GPS data sample sizes are often smaller, especially if participant-based samples are used, which inevitably introduces bias. One solution may be to use GPS-enabled mobile apps.
Bluetooth and Wi-Fi data are used much less frequently. Bluetooth is a wireless communication technology embedded in smartphones, laptops, and mobile devices that can monitor a large number of users without prior notice to device owners. Mobile devices carried by people can be detected by sensors equipped with Bluetooth technology that can thus track the individual behavior of a large number of tourists. Due to the very short radio range of Bluetooth data reception, this type of data is used in very small-scale contexts, such as monitoring the movement of visitors in tourist activities or planned events (e.g., festivals, concerts, etc.). Bluetooth technology avoids the need for users to preregister (compared to mobile apps and other technologies), is low cost, and highly convenient. In addition, this data can be used in indoor environments where GPS or mobile phones may have problems due to weak signal.
Compared to Bluetooth, Wi-Fi data is more convenient and cost efficient as it is available on all modern smartphones. However, like Bluetooth data, Wi-Fi data has a very low coverage range compared to mobile data and allows users to be tracked without prior notice. Few tourism firms have used Wi-Fi data to date, although with the growth of Wi-Fi services in public spaces, expect Wi-Fi data to grow and become more accessible in the future. It should also not be forgotten that Bluetooth data and Wi-Fi data facilitate unannounced tracking of users, which may conflict with the principle of opt-in data collection.
9.1.5 Web search and transaction data
Unstructured data can also come from the traces left by users when browsing the web, in searches for information made through search engines, or purchases and reservations made online. Search engines like Google, Yahoo, or Baidu are some of the main sites tourists use to plan a trip or vacation. Search “footprints” are a valuable source of data on the tourism demand side and reflect the interest of tourists in places, products, and tourism providers. The data generated from tourists’ online purchases and web browsing can be used to analyze things such as preferences and behavior patterns of tourists, as well as to forecast the arrival of tourists to a destination or a hotel (e.g., via Google Trends, Google Analytics traffic indicators, or the Baidu Index in China). It can also be used to predict economic indicators such as room demand or prices in a destination. Other transactional data, such as online room reservations or the purchase of travel products, registered by the transactional systems of tourism firms, can help owners and managers make decisions about what to offer customers or where to invest more. When tourists pay for their trips or reservations by credit card, the data is recorded and can be used later by firms to analyze the visitor’s purchasing behavior and, where appropriate, design a personalized product.
9.2 Data Analytics
Analytics has been around since the 1950s and it is not a new idea. The origins of analytics were rudimentary, with access to a very small number of data sources, most of which came from internal business systems and were stored in a single repository, such as a data mart or data warehouse. The only types of analysis that could be done at that time were descriptive, and business intelligence consisted of a few reports on the most relevant business processes. In the early 2000s, companies like Google and Yahoo began to use Big Data to analyze the behavior of their users, and everything changed radically. The term Big Data became popular and data analytics and traditional business intelligence turned upside down.
Data analytics uses two sets of methods to answer different types of questions: descriptive analytics methods and predictive analytics methods (Andersen et al., 2018). Descriptive analytics analyzes data to answer the question: “what happened”. This approach informs human decision makers about the past and present factors that explain the occurrence of a certain situation or event. Decision-making is therefore reactive, since it occurs once it is understood why things happened in the past. Techniques include descriptive statistics on combined or aggregated data to find relationships, patterns, and trends between them (e.g., cross-tabulation, correlation, and regression models). Typical examples of descriptive analytics are reports and dashboards for senior management based on key performance indicators (KPIs).
Predictive analytics is a data analytics approach geared towards answering the question: “what’s going to happen”. The analyst’s attention in this case is more focused on the future than on the past, so predictive techniques become increasingly advanced and sophisticated (e.g., estimation of probabilities, predictive models that combine classical statistics and machine learning, etc.). Decision-making supported by predictive analytics is of an active type, since it tries to foresee what is going to happen and anticipate the firm’s response.
An even more advanced level of analytics is prescriptive analytics, which is intended to guide the firm’s decision-making and action plans. This approach involves a proactive mode of decision-making. The main differences between the active mode (predictive analytics) and the proactive mode (prescriptive analytics) lies in the organizational capabilities that the firm needs to implement the results of the analysis, rather than in the techniques used (Andersen et al., 2018). In the proactive mode, a combination of statistical models based on regressions (descriptive analytics) and probabilities (predictive analytics) are used to make rules-based, simulation, and optimization decisions. Current applications of prescriptive analytics include the analysis of dynamic and complex networks, often combining a variety of transactional, process, and sensor data streams.
Traditional tourism firms have so far focused almost exclusively on descriptive analytics. However, as the amount of data that firms can access has grown exponentially and become huge, it is possible for firms to start extracting more value from it, provided they are able to store and process data much faster. This has caused predictive and prescriptive analytics to gain significant momentum and firms realize that they must acquire new skills and capabilities to keep moving at a faster pace (Larson & Chang, 2016). Therefore, tourism firms must begin to familiarize themselves with increasingly advanced analytical techniques to move from reactive to active decision-making. This means that firms must take steps forward and evolve towards data science and learn to combine cuttingedge techniques such as machine learning. For many tourism firms, making this leap from descriptive analytics to advanced modes of predictive analytics will surely be difficult. Notwithstanding, owners and managers should be aware that increasing the knowledge and skills for data analysis has become a crucial driver of business competitiveness, without which data has no value.
9.3 Business Intelligence
Business intelligence includes the set of processes, technologies, and applications required to collect, analyze, and visualize business data not typically provided by regular reports and that can be used to support both operational and strategic decision-making (Larson & Chang, 2016; Marcello et al., 2018). Business intelligence empowers tourism firms by giving them an edge through information and knowledge management to improve business decision-making.
Modern business intelligence dates back to the 1990s when tools specialized in extracting, processing, and storing data in a central data warehouse began to emerge. These tools were used to organize, analyze, and visualize data in a descriptive way, such as through Online Analytical Processing (OLAP), which allowed sales, marketing, and dashboard reports to be generated for senior management by querying large amounts of multidimensional data (called OLAP cubes). However, the complexity involved in handling these tools made it difficult for inexperienced users with no prior knowledge of data analytics to perform basic business intelligence tasks on their own.
Business intelligence has continued to evolve ever since to provide tourism firms with historical, current, and predictive views of their key business variables and business processes, and increasingly to support decision-making activities that generally involve choosing between different alternatives. Today, firms use business intelligence to inform a wide variety of operational decisions, from product positioning and pricing, to generating insights into new markets, assessing the suitability of products and services for certain market segments, and measuring the impact of marketing and advertising strategies. In addition, with the exponential growth of the internet and the massive adoption by consumers of mobile devices, the IT industry has begun to address the difficulties related to the speed of data processing, which will result in tourism firms having tools capable of processing and analyzing large amounts of structured and unstructured data in real time, as well as cloud-based on-demand analytics capabilities. Ultimately, business intelligence becomes particularly powerful when it combines external data obtained from markets and customers, with internal data from the firm itself (e.g., financial data, customers data, bookings, etc.). This way, business intelligence has reached a point where it is no longer a complementary utility within a more general framework of business analytics applications, but rather an essential requirement for those firms willing to remain competitive.
9.4 Data Vocabulary
The world of data has become very complex in recent years and is expected to become even more so in the future. This, coupled with the fact that data is gaining strong momentum in tourism, means that new terms appear from time to time to refer to the innovative practices, tools, and capabilities that are shaping the Smart Revolution. Even terms whose meaning seemed conventionally accepted, sometimes acquire a new meaning in light of the latest technological and management advances.
In the face of this increasing sophistication in data management and its accompanying terminology, owners and managers need to understand the meaning of these key concepts, otherwise confusion sets in and they become unable to fully comprehend the true dimension of the phenomenon. For this reason, in this section we are going to review some key concepts with which every owner and manager of a tourism firm should be familiar, and which are relevant to framing and understanding the next chapters of the book.
9.4.1 Data economy
As the volume of data has become really big, business organizations around the world have started to focus more and more on the “Data Economy”. This circumstance is not only the direct result of the amount of data that firms are now able to collect, but also of the great variety of data captured, the general increase in the computing power available to extract value from data, and the substantial reduction in the price of computing capabilities, all of which have fueled the expansion of the so-called Data Economy. The Data Economy is defined as the set of goods and services whose value is based on the exploration and exploitation of existing databases with the aim of creating new value offerings that firms use to remain competitive. The Data Economy considers that data is at the center of all economic activity and that the development of Big Data, cloud services, and the IoT, are essential for the competitiveness of economies. In short, the Data Economy recognizes that data is a key intangible asset for value creation that, together with high-performance computing, is changing the way knowledge is created and shared. Hence, data is a catalyst for economic growth, innovation, and digitalization of SMEs in all economic sectors and society in general.
9.4.2 Data science
Data science is a broad term that encompasses the set of technologies and techniques that are used in advanced analytics on large amounts of data. As the name implies, it is a scientific approach of an interdisciplinary nature focused on extracting information from data in various ways to produce data products or models. Data science integrates next-generation data analytics fields such as statistics, data mining, and machine learning under a single scientific umbrella. As such, data science transcends the methods of traditional descriptive statistics, moving beyond identifying patterns in static and historical data to focus on combining static and dynamic data from very different sources that are then analyzed at high speed using techniques of descriptive, predictive, and prescriptive analytics. For example, the data about a customer’s purchases may contain a pattern that tourism firms can learn in order to make relevant suggestions for future purchases by the same customer. These buying patterns can then be compared with other cross-sectional buying patterns from customers and correlated with seemingly unrelated data (e.g., geographic location, weather conditions, socioeconomic context, etc.) to gain a more comprehensive understanding of customer buying behavior under a multitude of conditions (Unhelkar, 2017). Figure 9.2 illustrates the different disciplines that are integrated within modern data science, which combines broad and multidisciplinary skills that include mathematics, statistics, computer science, and knowledge at the sector or industry level.
As Big Data evolves, the need to analyze fast-moving, non-static data will increase. Advances in data science are making it possible to combine large amounts of static data with real-time event processing, dynamic application of business rules, and the incorporation of cognitive and machine learning algorithms, thus transcending traditional statistical analysis techniques. Therefore, data science has become a key toolkit for developing Big Data initiatives and supporting the role of the knowledge worker in business firms. However, as is often the case with other disruptive technologies, the challenge does not lie so much in data science or Big Data technologies, but rather in the process of organizational transformation that each firm should properly implement as a result.
9.4.3 Data scientist
In the context of the Data Economy and the growth of Big Data, it is crucial for business owners and managers to understand what kind of skills are needed to succeed. As more tourism firms get involved with data, new career profiles are created, among which the data scientist stands out. Data scientists are an evolution of the traditional data analyst role that already existed in many organizations.
What sets data scientists apart is that they combine computing and hard science skills that go beyond the skills of a data analyst. According to IBM (2022), a data scientist is someone with the following skills and abilities:
- Knows and applies statistics, mathematical modeling, and the scientific method.
- Uses a wide variety of tools and techniques to evaluate and prepare data, ranging from SQL to data mining and integration methods.
- Uses predictive analytics and artificial intelligence, including machine learning and deep learning models, to extract insights from data.
- Writes code for applications that automate processing and calculations with data.
- Is a storyteller who conveys the meaning and importance of results to decision makers and stakeholders.
- Explains to others how the results can be used to solve business problems.
Therefore, data scientists are competent beyond the use of classic software tools and data management and business analytics. They know how to choose well the problems to investigate using business logic, build and execute code to create models and prototypes of solutions, including data visualization tools, and use the results obtained to develop products or services based on valuable information that was previously hidden. As these “unicorns” are scarce in the labor market, it is usually more effective and easier for tourism firms to create working groups made up of a combination of professionals (both internal and external) who together bring these skills and can add value to the organization (Del Vecchio, Di Minin, et al., 2018).
9.4.4 Data mining
Data mining is the analysis of large data sets stored in databases, warehouses, or other repositories of information to extract hidden information and uncover potentially useful patterns for use by stakeholders. As such, data mining is an interdisciplinary field that brings together machine learning, statistics, databases, and visualization techniques (Castaldi et al., 2018). The objective of data mining is to efficiently build predictive or descriptive models based on a large amount of data that, besides explaining them, can be generalized to new data. A typical data mining process includes these steps (F. Chen et al., 2015):
Data preparation: It usually encompasses three steps: integrating data from various data sources and cleaning the noise contained in the data, extracting some parts of the data (target data) and loading it into the data mining system, and preprocessing the data to facilitate data mining.
Data mining: It consists of applying algorithms to target data in order to find patterns and evaluating the knowledge discovered from those patterns.
Data presentation: It is about visualizing the data and delivering the extracted knowledge (insights) to the users.
One of the most compelling applications of data mining is through artificial intelligence. With artificial intelligence, data scientists seek to mimic the efficiency of the human brain in processing information and capturing critical aspects of data in a way that enables future use. In this way, data mining offers the automatic discovery of previously unknown patterns and the prediction of trends and behaviors. These new technologies complement conventional decision support tools, giving business analysts and marketers new opportunities to analyze the business more quickly and efficiently. For tourism owners and managers, data mining opens up new opportunities to create products and services capable of responding to rapidly changing market conditions and remaining competitive.
9.4.5 Machine learning
Machine learning is a branch of artificial intelligence in which the machine or system can learn from data, find patterns, and automatically make decisions based on this learning. Machine learning stems from computational learning theory and provides an informational system with the ability to learn automatically by detecting patterns, without having to be explicitly programmed, and then make predictions (Noor & Haneef, 2020). Machine learning techniques are now being used in complex fields as varied as biology, medicine, social media, astronomy… and tourism, to find hidden insights within data.
9.5 Discussion Questions
What limitations or barriers do tourism SMEs have to generating large amounts of data and exploiting it to its full potential?
How do tourism SMEs store and process their data? Are there differences between small and large tourism firms?
What stage of the data life cycle is currently the most challenging for the tourism firm?
What types of data sources are most used by tourism firms? What mix of structured and unstructured data is optimal?
What are tourism firms doing to integrate data and knowledge management capabilities in the organization?
What mode of decision-making currently prevails in tourism firms (i.e., reactive, active, proactive)? What reasons explain it?
What is the potential of machine learning for the creation of innovative tourism products and services?