“This is the paradox: effective response in the ‘Networking Age’ requires open data and transparency, but the more information that is shared the more risks and challenges for privacy and security emerge” (UN OCHA, 2014a). Disasters and emergencies create situations of vulnerability for affected populations that often leave them more exposed to harm. There are many ways in which data scientists and analysts can do harm, for instance, by exposing the private lives of people, or by putting response operations in danger.
Privacy and security concerns should not be used as a reason not to apply new communication technologies during emergencies, but they should be taken into account (UN OCHA, 2012). At a high level, researchers and practitioners need to find the right spot between full opacity and full transparency, finding a balance between security concerns and operational needs. Given their risk aversion, it may be impossible to engage with some organizations, particularly the ones that operate at a national level or at an international level, without some formal agreements in place, particularly related to personal data protection standards.
Social media brings more complexity to emergency and disaster situations by reducing the amount of control that responders and relief workers have. For instance, emergency incident scenes have traditionally been delimited and closed by barriers such as the proverbial “yellow tape” used by the police in many countries. Social media and mobile phone cameras can nearly evaporate this traditional scene control (Crowe, 2012, ch. 5). Rescue personnel, including firefighters, police officers, and others who physically converge on the scene have responsibility and legal codes to respect. As the public also enters the scene (physically or virtually), they must adhere to ethical standards they may be unfamiliar with. Technology can be used to weaken or to strengthen those standards.
This chapter is an attempt to highlight some of the ethical concerns around processing social media data during emergencies. Some of these concerns have elicited some degree of self-regulation, for instance, in the form of ethical guidelines, while others have not.
We startwith widely agreed-upon areas in which the practitioner community has already developed some guidance, including privacy (§11.1) and human conflict (§11.2). Then, we venture into territory where the questions are newer and hence less guidance can be provided, such as the protection of digital volunteers (§11.3), and issues related to experimentation (§11.4) and data sharing (§11.5).
As noted by many bloggers and journalists,many of the Syrians refugees fleeing from the war in 2015 were carrying smartphones. One of them told an AFP reporter: “Our phones and power banks are more important for our journey than anything, even more important than food.” Smartphones provide a guide, a map, and help refugees navigate many issues, including asylum bureaucracy. In extreme circumstances, a message with geographical coordinates sent from a sinking boat can be the difference between life and death.
The question on whether people will continue using social media during crises, is really a question on whether they will continue using social media at all; in times of crises, people use the tools that are most familiar to them (Potts, 2013). As long as people use mobile technologies and social media, these technologies will continue playing a key role in the way they communicate during disasters and humanitarian crises.
Different emergency response and humanitarian relief organizations make different choices with respect to how to be part of this conversation. Some have embraced social media, others have remained in the sidelines, most are somewhere in between. Individuals in these organizations also make their own choices, which follow to some extent – but seldom completely – whatever is mandated by organizational policies. Individuals with more interest and/or competencies on social media have been driving forces to change their organizations and their policies.
Computing researchers and practitioners, specially during the early years of crisis informatics, often developed methods with little or no input from the crisis and disaster management community. This has changed in recent years, as more interdisciplinary research projects appear, both big and small. These projects can be very rewarding, but they are also very challenging for all involved.
This chapter integrates part of the discussion of the previous chapters by using two paradigms: information quality (§12.1) and peer production (§12.2). Next, we address two emergent topics: using technology to support institutional communications, (§12.3) and processing user-generated videos for crisis response (§12.4). We conclude outlining relevant factors for future developments on this field (§12.5).
During the 2015 Nepal earthquake, a 26-year-old Indian lawyer and activist posted the following on Twitter:
Media must report about d alleged 20k RSS chaps off 2 #Nepal.here's a pic coz d 1 @ShainaNC shared isn't true.. ;)
Meaning: media must report about allegations that twenty thousand volunteers from India's Rashtriya Swayamsevak Sangh (RSS) had joined the relief efforts in Nepal, as falsely claimed on Twitter by Shaina NC (a member of the Bharatiya Janata Party, a political group close to the RSS). This message mixes shortened words (“d” for “the,” “2” for “to,” “coz” for “because,” “pic” for “picture”), ambiguous abbreviations (“RSS,” which may mean a number of things), British slang (“chaps”), platform-specific codes (such as the hashtag #Nepal and the user mention @ShainaNC), punctuation/capitalization issues (lack of spacing between #Nepal and here, usage of two dots instead of an ellipsis), and sarcasm expressed through a “wink” emoticon (“)”).
In general, understanding a message in social media requires contextual information to compensate for fragmented, ambiguous – in otherwords, vague – text that is open to more than one interpretation.
This chapter is about Natural Language Processing (NLP), which encompasses computational methods created for dealing with human language. NLP methods incorporating statistical machine learning elements were developed in the 1980s and 1990s using mostly profesionally written texts, such as newspaper articles. Since the late 1990s and the 2000s, these methods have been extended to deal first with Web content, and in the late 2000s and early 2010s, with social media messages and short text messages sent from mobile phones (SMS). Many modern NLP methods are based on machine learning.
The next section (§3.1) describes the text of social media messages. Then, we outline basic NLP methods such as tokenization, stemming, part-of-speech tagging, and dependency parsing (§3.2), as well as sentiment analysis/opinion mining (§3.3). Next, we describe how to locate references to entities such as people and organizations (§3.4), and, particularly, places (§3.5). Finally, we refer to methods for extracting structured data from unstructured text (§3.6), and for adding semantics to messages (§3.7).
Social Media Is Conversational
In general on the Internet “we find language that is fragmentary, laden with typographical errors, often bereft of punctuation, and sometimes downright incoherent” (Baron, 2003).
In 2008 Google launched Flu Trends, showing that the search volume of certain terms in a region was strongly correlated with levels of flu activity in that region. They also found that the increase in the usage of flu-related terms happened days before health care authorities were able to report an increase in cases of flu. The reasons are twofold: there are delays in the official data collection done from hospitals, and people search for symptoms before visiting a doctor. Despite the success of Flu Trends, it was not beyond criticism. Lazer et al. (2014) highlighted a series of issues with its predictions, including a systemic bias that produced an overestimate in 100 out of the 108 weeks analyzed during a two-year period. As a more general criticism, Lazer et al. denounced this as an example of big data hubris: the assumption that a large dataset can be a substitute, rather than a supplement, to a traditional analysis method. The popular press has lambasted “big data fundamentalism,” the idea that larger datasets imply more objective results.
Researchers performing social science research have embraced and criticized, sometimes at the same time, the usage of large-scale datasets from social media. Social media, as a reflection of social interactions at large scale and in digitally accessible formats, provides a larger quantity of data at a much lower cost than alternative datasets, such as surveys or direct observations. However, the infamous “streetlight effect” may be at play here: scientists inclined to search for evidence where it is easier, instead of where better evidence is likely to be found.
The representativeness of social media and other types of digital traces, and their lack of context, are often cited as key factors to distrust conclusions based solely on them. “Just because you see traces of data doesn't mean you always know the intention or cultural logic behind them. And just because you have a big N doesn't mean that it's representative or generalizable.”4 For instance, methods to use trends found on Twitter data as direct predictions of political election results, have been to a large extent debunked (Gayo-Avello, 2012).
This chapter warns against a naïve interpretation of results obtained from social media data from emergencies. The quality of social media data for this purpose is affected by at least two types of factors.
One of the main reasons why social media is relevant for emergency response is because of its immediacy. For instance, the first reports on social media about the 2011 Utøya attacks in Norway appeared 12 minutes before the first news report in mainstream media (Perng et al., 2013), and in the 2013 Westgate mall attacks, social media reports appeared within a minute after the attack started, “scooping” mainstream media by more than half an hour. People on the ground can collect and disseminate time-critical information, as well as data for disaster reconnaissance that otherwise would be lost due to the gap between a disaster and their arrival on site (Dashti et al., 2014).
On a lighter note, it has been speculated, jokingly but plausibly, that the damaging seismic waves from an earthquake, traveling at a mere three to five kilometers per second, can be overtaken by social media messages about them, which propagate orders ofmagnitude faster through airwaves and optical fiber.
In this context, it is not surprising that people who associate social media with immediacy also expect a fast response from governments and other organizations, for instance, expecting help to arrive within a few hours of posting a message on social media (American Red Cross, 2012). Independently of whether those expectations are met or not in the near future, some capacity for rapid response to social media messages needs to be developed.
We recall from Section 1.5 that our main requirements are to create aggregate summaries about broad groups of messages (capturing the “big picture”), and to detect important events that require attention or action (offering “actionable insights”). We now add a new requirement: timeliness.
This chapter describes methods that ensure that the output summaries or insights are generated shortly after the input information required to create them becomes available. The way to achieve this low-latency or real-time data processing is to adapt a computing paradigm known as online processing, or equivalently, to consider that the input data is not a static object, but a continuously flowing data stream.
We begin by explaining how online processing differs from offline processing (§6.1), and present high-level operations on temporal data (§6.2). Then, we describe the framework of event detection (§6.3) and methods for finding events and subevents (§6.4). We also introduce the approach of incremental update summarization (§6.5), and end with a discussion of domain-specific approaches (§6.6).
“What can speed humanitarian response to tsunami-ravaged coasts? Expose human rights atrocities? Launch helicopters to rescue earthquake victims? Outwit corrupt regimes? A map.” Patrick Meier's pioneering work in crisis mapping capitalizes on two key advantages of crisis maps: they act as a focus for digital volunteering efforts, and they provide information in a way that is familiar and easy to digest by relief agencies.
Previous chapters have described several methods for filtering, classifying, consolidating, and extracting trends from social media messages. A last but not least step of this process is to present this information in a way that is helpful to its intended users. Emergency managers often express the requirement to incorporate social media messages into a Geographical Information System (GIS), using a map-based display (Hiltz et al., 2014). Beyond maps, other visualizations have been used to highlight other aspects of the data, such as temporal trends, themes, topics, or connections.
This chapter describes current practices in the presentation of crisis data extracted from social media. It builds upon the various methods to process and consolidate social media messages presented in previous chapters. The emphasis is on crisis maps, which is probably the most popular paradigm used by volunteer communities and social media users to present information about disasters (§10.1). The chapter also analyzes other visual elements that are present in existing “dashboards” about social media during crises (§10.2), including their interactive elements (§10.3).
Figure 10.1 depicts various types of maps, which we describe on this section. The choice of which type of map to use depends on the requirements of end users. At a high level,we continue using the breakdown presented in Section 1.5 between user requirements seeking “actionable insights,” and user requirements capturing the “big picture.” This distinction can be used to select an appropriate type of map for a given application.
Actionable insights. End users interested in actionable insights would prefer representations in which information items are presented individually, or grouped in small clusters. This requirement can be satisfied using a dot distribution map, or a proportional symbol map.
In August 2014, CBS News published a story and a cellphone photo of a bizarre meteorological phenomenon. The reporter used a photo provided by a tugboat captain, who stated that he was not a meteorologist but described the image as a rare “sideways tornado.” The phenomenon is actually more than rare: it does not exist. The reporter could have consulted with the TV station's meteorologist, who later easily identified the photo as a shelf cloud. The story was pulled off their website and then amended, but the embarrassment for the news network did not go away.
Hoaxes in media are centuries old. Noted satirists such as Jonathan Swift in the seventeenth century and Mark Twain in the eighteenth were successful at spreading them well before the Internet (Walsh, 2006). Disaster-related media hoaxes predate the Internet by decades. A famous example was the 1938 radio adaptation of the alien-invasion novel byH.G.Wells, TheWar ofWorlds,which at the time caused numerous calls to newspapers and the police, and created a significant media backslash for (unintentionally, according to its producers) “deceiving” the listeners. Social media simply places the tools necessary to create and spread all kinds of information, including hoaxes, on the hands of many.
This chapter deals with concerns about the presence of false information in social media, which are frequently cited as one of the major obstacles to its adoption by humanitarian and emergency relief organizations (Hiltz et al., 2011; Hughes et al., 2014b). Officers at these organizations have said that they often find themselves wondering if they can trust a given piece of information from social media, or not. Some of them believe that social media are more likely than other sources to contain bad, false, unverified, or inaccurate information (Bressler et al., 2012; Vieweg et al., 2014). Emergency managers, who may also want to integrate information provided by the community, have also expressed doubts about the accuracy and reliability of social media (Merrick and Duffy, 2013). While disaster response organizations are used to operate with “good enough” information during emergencies, they seem to hold higher, even “unreasonable” standards of accuracy for data from social media (Tapia and Moore, 2014).
In social media during emergencies, it is easier to find a message describing something that is true, than it is to find a message describing something that is false (Mendoza et al., 2010).
Crises and disasters are portrayed significantly in both legacy and onlinemedia, and attract the attention of millions of Internet users. Many of these users are willing to help humanitarian relief efforts remotely, through tasks that go from providing, curating, and synthesizing information about the disaster (Vieweg et al., 2010; Gao et al., 2011; Starbird, 2012a; Liu, 2014), to performing crisis mapping and all sort of digital humanitarian tasks (Meier, 2015).
While a seamless integration of digital volunteering efforts into professional/ formal response efforts has yet to be realized, volunteer groups have been successful in certain areas, such as creating maps that are useful for humanitarian organizations: “After Haiyan [November 2013 typhoon in the Philippines], many relief organizations, including the OCHA and the medical aid group Médecins Sans Frontières (also known as Doctors Without Borders), have gone into the Philippines carrying with them continually updated maps of the country generated by more than 1,000 OpenStreetMap volunteers from 82 countries” (Butler, 2013). A survey among officers from large humanitarian organizations found that many of them used some type of volunteer-processed social media data (Tapia and Moore, 2014).
Volunteering during disasters is not a product of newmedia or new technologies. Instead, it is an integral part of how communities react to disasters (Dynes, 1970). What is new today is that electronic communications have effectively redefined the boundaries of these communities. The “village” that feels, for instance, the devastating effects of a Typhoon in the Philippines, is indeed a global village. People living half a world away can become actively involved, and even become key players in the response to an ongoing crisis (Carvin, 2013).
In general, the involvement of volunteers has been described by formal organizations as a mixed blessing. Disasters involve the convergence of people, resources, and information. Naturally, some of the people, some of the resources, and some of the information are actually helpful in the disaster response, but not all of them (Fritz and Mathewson, 1957).
This chapter focuses on how large groups of people can contribute effectively, via the Internet, to response and relief efforts. In addition to obstacles regarding the integration of their contributions into the work of formal organizations, further challenges include recruiting volunteers, keeping them engaged, and ensuring a high-quality output.
The 2010 earthquake in Haiti represented, in more than one sense, a collision between traditional crisis information processing practices and new information dynamics. Emergency relief organizations were not prepared to deal with high-volume data flows coming from two new sources. First, mobile-enabled communication technologies were being used to send a large number of messages by affected populations, who expected an answer from relief organizations. Second, vast quantities of data were being produced by volunteers in technical communities (Harvard Humanitarian Initiative, 2011, p. 19). In general, the amount of data generated during a crisis is overwhelming. Processing crisis-relevant social media messages requires careful attention to scalability issues, particularly because the production and consumption of data often surges unpredictably by several orders of magnitude.
This chapter focuses on the data volume, and presents scalable methods to acquire, store, index, and retrieve social media messages, with an emphasis on their textual content.We describe the data sizes that are typical of social media during disasters (§2.1), and methods to acquire (§2.2) and filter (§2.3) data.We then present methods for data representation (§2.4) as well as data indexing and storage (§2.5).
Social Media Data Sizes
Any characterization of social media risks becoming outdated quickly. The Internet Live Stats project maintains a dizzying display of visual statistics depicting how much content is generated every day by social media users.
Social media platforms usually report the number of users they have in terms of monthly active users, defined as peoplewho interact with the platform at least once during a month. For the large platforms, this figure is usually measured in the order of hundreds of millions. Every day, the number of messages posted in large social media platforms such as Twitter, Facebook and Instagram is in the order of tens of millions to hundreds of millions of messages, and hundreds of thousands of hours of video are uploaded to YouTube.
In the case of microtext, while each message is short (e.g., currently a maximum of 140 characters in Twitter, and 420 characters in Facebook status updates), meta-data attached to messages causes a blowup in data sizes. A data record for a Twitter message, typically serialized as a string in JSON,4 is around 4KBwhen all the formatting and metadata attached to each message are included.
The phrase “Twitter revolution” was coined in 2009 to describe the role of Twitter in “viral” calls to demonstrations against fraud in elections in Moldova and Iran. Since then, the rapid propagation of information through social media and mobile text messages has played an important role in the recruitment for protests in the Arab world, Europe, and America (González-Bailón et al., 2011). This rapid propagation has also been observed during disasters; for instance reposting activity has been observed to increase significantly in these situations (Starbird and Palen, 2010). Adistinctive feature that is a consequence of these processes is the appearance of explosive “bursts” of messages that reach large masses of people in a relatively short time frame.
Sociograms, which are graphs in which nodes represent people and edges represent social connections, started to be recorded and analyzed systematically in the early 1930s (Moreno, 1934). Currently, sociograms having hundreds of millions of nodes, which we now identify with social networks, continue to fuel the growth of an enormous body of research that includes work by sociologists, psychologists, communication scholars, physicists, computer scientists, and interdisciplinary teams. In addition to holding connections among users, current
social networking sites allow the creation of information networks, graphs in which nodes receive and disseminate information to other nodes through links. Social networks are a defining element of social media, and graph theory provides a theoretical foundation for studying social networks, including aspects such as the mechanism for the formation of online connections and the propagation of information online.
This chapter studies two interrelated aspects of social media during crises from a graph-theory perspective. First, there are structural properties of social and information networks that allow us, for instance, to measure certain properties about the users or groups of users who create or propagate information (§5.1). Second, there are particular types of information cascades – the history of propagation of a particular content on a network (§5.2) that can help us, for instance, determine the characteristics of messages, groups of messages, and users (§5.3).
Crisis Information Networks
Social media and other online systems include social networks in two ways: explicitly or implicit, also known as articulated or behavioral, respectively.
Explicit (articulated) social networks are created by people specifying who they are connected with, for instance by adding someone as a “friend,” or by “following” someone in social media.
When faced with a sudden crisis, people quickly try to gather as much information as they can from the sources most immediately available to them: those in their immediate vicinity, friends and acquaintances via phone and texts, governments, nongovernment organizations, mass media such as radio and television, the Internet, and social media (Gao et al., 2014). Based on this information, they quickly take cover, flee, or act in a way that keeps them away from danger (Dynes, 1994).
Popular depictions of human response to crisis in movies and TV series tend to show widespread mayhem and panic. These scenes in “disaster movies” are plot devices, not very different from typical scenes in horror movies in which people irrationally run straight into danger (Mitchell et al., 2000). They are part of a long-standing myth that understands emergency management from a managerial perspective (Calhoun, 2004), and perpetuates the idea that disasters need to be policed because otherwise people will panic and riot.
As Palen (2014) emphasizes, an agenda of research about social media on disasters can uncover fascinating points if it avoids these misconceptions, and pays attention to the pro-social behavior of people during an emergency. Most people do not panic, but instead rapidly and effectively collect information, make decisions, and coordinate with others through a variety of channels. People affected by a disaster are the first to respond to it, often improvising complex rescue operations that save lives.
During a crisis, everybody involved – the public, the media, the government, emergency services, relief organizations, and others – try to quickly gain situational awareness. This is a complex process, which involves perceiving, comprehending, and being able to make predictions about the near future (Endsley, 1995; Vieweg, 2012). Gaining situational awareness is essentially a collective intelligence process that involves many actors interacting with a combination of various sources of information (Hutchins, 1995; Palen et al., 2010). Social media can contribute to situational awareness during a crisis, but handling its volume and complexity makes it impractical to be directly used by analysts.
This book is about how to use computing to help bridge this gap. This chapter explains what is the importance of social media for crisis management, exemplifying through recent crisis situations (§1.1). It provides some key concepts (§1.2) and describes information flows happening in social media during disasters (§1.3).
In 2005 the Inter-Agency Standing Committee (IASC), a permanent forum including agencies from the United Nations (UN) and agencies not belonging to the UN (such as the Red Cross), introduced a number of reforms designed to improve humanitarian response. A visible reform was the establishment of the Cluster System, which organizes large-scale multiagency humanitarian response into eleven areas of action, each one with its own responsibilities: health, protection, food security, emergency telecommunication, early recovery, education, sanitation,water and hygiene, logistics, nutrition, emergency shelter, and camp management and coordination. The Cluster System is not without critics, but it serves to structure response and it is liked by national governments because it introduces a single focal point which is accountable for a specific response area.
This chapter describes methods for automatic text categorization, which allow us to make sense of heterogeneous, varied messages by sorting them into categories. In the same way in which coordination among humanitarian agencies is facilitated by abstracting from specific response actions to response areas, coping with typical crisis collections from social media, involving millions of messages, is made easier by abstracting from the particular (a specific message) to the general (a class of messages).
There are two broad families of classification methods: supervised and unsupervised. In supervised classification, we first manually classify a set of items (messages in this case) into categories using human annotators, and then use these example items to automatically learn a model for classifying new, unseen items into the same categories. In unsupervised classification (or clustering), we do not provide any example item classified a priori, but instead allow a method to discover groups of related items based on their similarity.
We begin with a description of the main information categories found in social media and short text messages during crises (§4.1). Next, we introduce supervised (§4.2) and unsupervised classification methods (§4.3).
The first question when categorizing content is how to determine which information categories to use. There are many factors that drive the design of these categories. The first and most important are the information needs of the users for which the categorization is done, which may include emergency managers, humanitarian relief workers, policy makers, analysts, and/or the public.
Email your librarian or administrator to recommend adding this to your organisation's collection.