To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
With the rise of social media, the web has become a vibrant and lively realm in which billions of individuals all around the globe interact, share, post, and conduct numerous daily activities. Information is collected, curated, and published by citizen journalists and simultaneously shared or consumed by thousands of individuals, who give spontaneous feedback. Social media enables us to be connected and interact with each other anywhere and anytime – allowing us to observe human behavior in an unprecedented scale with a new lens. This social media lens provides us with golden opportunities to understand individuals at scale and to mine human behavioral patterns otherwise impossible. As a byproduct, by understanding individuals better, we can design better computing systems tailored to individuals' needs that will serve them and society better. This new social media world has no geographical boundaries and incessantly churns out oceans of data. As a result, we are facing an exacerbated problem of big data – “drowning in data, but thirsty for knowledge.” Can data mining come to the rescue?
Unfortunately, social media data is significantly different from the traditional data that we are familiar with in data mining. Apart from enormous size, the mainly user-generated data is noisy and unstructured, with abundant social relations such as friendships and followers-followees. This new type of data mandates new computational data analysis approaches that can combine social theories with statistical and data mining methods. The pressing demand for new techniques ushers in and entails a new interdisciplinary field – social media mining.
Mountains of raw data are generated daily by individuals on social media. Around 6 billion photos are uploaded monthly to Facebook, the blogosphere doubles every five months, 72 hours of video are uploaded every minute to YouTube, and there are more than 400 million daily tweets on Twitter. With this unprecedented rate of content generation, individuals are easily overwhelmed with data and find it difficult to discover content that is relevant to their interests. To overcome these challenges, we need tools that can analyze these massive unprocessed sources of data (i.e., raw data) and extract useful patterns from them. Examples of useful patterns in social media are those that describe online purchasing habits or individuals' website visit duration. Data mining provides the necessary tools for discovering patterns in data. This chapter outlines the general process for analyzing social media data and ways to use data mining algorithms in this process to extract actionable patterns from raw data.
The process of extracting useful patterns from raw data is known as Knowledge discovery in databases (KDD). It is illustrated in Figure 5.1. The KDD process takes raw data as input and provides statistically significant patterns found in the data (i.e., knowledge) as output. From the raw data, a subset is selected for processing and is denoted as target data. Target data is preprocessed to make it ready for analysis using data mining algorithm. Data mining is then performed on the preprocessed (and transformed) data to extract interesting patterns. The patterns are evaluated to ensure their validity and soundness and interpreted to provide insights into the data.
In this chapter, we introduce the discrete Fourier transform (DFT), which may be viewed as an economy class DTFT and is applicable when x[n] is of finite length (or made finite length by windowing). The DFT is one of the most important tools for digital signal processing, especially when we implement it using the efficient fast Fourier transform (FFT) algorithm, discussed in Sec. 9.7. The development of the FFT algorithm in the mid-sixties gave a huge impetus to the area of DSP. The DFT, using the FFT algorithm, is truly the workhorse of modern digital signal processing, and it is nearly impossible to exaggerate its importance. A solid understanding of the DFT is a must for anyone aspiring to work in the digital signal processing field. Not only does the DFT provide a frequency-domain representation of DT signals, it is also useful to numerous other tasks such as FIR filtering, spectral analysis, and solving partial differential equations.
Computation of the Direct and Inverse DTFT
As we saw in Ch. 6, frequency analysis of discrete-time signals involves determination of the discretetime Fourier transform (DTFT) and its inverse (IDTFT). The DTFT analysis equation of Eq. (6.1) yields the frequency spectrum X(Ω) from the time-domain signal x[n], and the synthesis equation of Eq. (6.2) reverses the process and constructs x[n] from X(Ω). There are, however, two difficulties in the implementation of these equations on a digital processor or computer.
Social forces connect individuals in different ways. When individuals get connected, one can observe distinguishable patterns in their connectivity networks. One such pattern is assortativity, also known as social similarity. In networks with assortativity, similar nodes are connected to one another more often than dissimilar nodes. For instance, in social networks, a high similarity between friends is observed. This similarity is exhibited by similar behavior, similar interests, similar activities, and shared attributes such as language, among others. In other words, friendship networks are assortative. Investigating assortativity patterns that individuals exhibit on social media helps one better understand user interactions. Assortativity is the most commonly observed pattern among linked individuals. This chapter discusses assortativity along with principal factors that result in assortative networks.
Many social forces induce assortative networks. Three common forces are influence, homophily, and confounding. Influence is the process by which an individual (the influential) affects another individual such that the influenced individual becomes more similar to the influential figure. Homophily is observed in already similar individuals. It is realized when similar individuals become friends due to their high similarity. Confounding is the environment's effect on making individuals similar. For instance, individuals who live in Russia speak Russian fluently because of the environment and are therefore similar in language. The confounding force is an external factor that is independent of inter-individual interactions and is therefore not discussed further.
Individuals in social media make a variety of decisions on a daily basis. These decisions are about buying a product, purchasing a service, adding a friend, and renting a movie, among others. The individual often faces many options to choose from. These diverse options, the pursuit of optimality, and the limited knowledge that each individual has create a desire for external help. At times, we resort to search engines for recommendations; however, the results in search engines are rarely tailored to our particular tastes and are query-dependent, independent of the individuals who search for them.
Applications and algorithms are developed to help individuals decide easily, rapidly, and more accurately. These algorithms are tailored to individuals' tastes such that customized recommendations are available for them. These algorithms are called recommendation algorithms or recommender systems.
Recommender systems are commonly used for product recommendation. Their goal is to recommend products that would be interesting to individuals. Formally, a recommendation algorithm takes a set of users U and a set of items I and learns a function f such that
f : U × I → R (9.1)
In other words, the algorithm learns a function that assigns a real value to each user-item pair (u, i), where this value indicates how interested user u is in item i. This value denotes the rating given by user u to item i. The recommendation algorithm is not limited to item recommendation and can be generalized to recommending people and material, such as, ads or content.
In May 2011, Facebook had 721 million users, represented by a graph of 721 million nodes. A Facebook user at the time had an average of 190 friends; that is, all Facebook users, taken into account, had a total of 68.5 billion friendships (i.e., edges). What are the principal underlying processes that help initiate these friendships? More importantly, how can these seemingly independent friendships form this complex friendship network?
In social media, many social networks contain millions of nodes and billions of edges. These complex networks have billions of friendships, the reasons for existence of most of which are obscure. Humbled by the complexity of these networks and the difficulty of independently analyzing each one of these friendships, we can design models that generate, on a smaller scale, graphs similar to real-world networks. On the assumption that these models simulate properties observed in real-world networks well, the analysis of real-world networks boils down to a cost-efficient measuring of different properties of simulated networks. In addition, these models
• allow for a better understanding of phenomena observed in real-world networks by providing concrete mathematical explanations and
• allow for controlled experiments on synthetic networks when real-world networks are not available.
We discuss three principal network models in this chapter: the random graph model, the small-world model, and the preferential attachment model. These models are designed to accurately model properties observed in real-world networks. Before we delve into the details of these models, we discuss their properties.
MATLAB (a registered trademark of MathWorks, Inc.) is a language and interactive environment for numeric computation, algorithm development, data analysis, and data visualization. It is particularly well suited for digital signal processing applications. While MATLAB is relatively simple to use, it is a sophisticated package that can be a bit intimidating to new users. Fortunately, there are many excellent resources on MATLAB and its use, including MATLAB's own built-in documentation. This appendix provides a brief introduction to MATLAB that complements the MATLAB material found throughout this book.
Scripts and Help
When MATLAB is launched, a command window appears. Users issue commands at the command prompt (≫), and MATLAB responds, generally with the creation or modification of workspace objects. MATLAB supports different types of workspace objects, such as functions and strings, but usually objects are just data. The workspace window summarizes the names and important characteristics of currently available objects.
While users can directly input sequences of commands at the command prompt, it is generally preferable to instead use a MATLAB script file (M-file). A MATLAB script file is simply a text file (.m extension) that contains a collection of MATLAB statements. Comments are inserted by preceding text with a percent sign (%). Any text editor can be used to create an M-file, but MATLAB's built-in editor provides added functionality such as color-coded text, breakpoints, and various other features. M-files are easy to modify and facilitate rapid algorithm development. M-files are executed directly from MATLAB's editor or by typing the M-file name (without the .m extension) at the command prompt.
What motivates individuals to join an online group? When individuals abandon social media sites, where do they migrate to? Can we predict box office revenues for movies from tweets posted by individuals? These questions are a few of many whose answers require us to analyze or predict behaviors on social media.
Individuals exhibit different behaviors in social media: as individuals or as part of a broader collective behavior. When discussing individual behavior, our focus is on one individual. Collective behavior emerges when a population of individuals behave in a similar way with or without coordination or planning.
In this chapter we provide examples of individual and collective behaviors and elaborate techniques used to analyze, model, and predict these behaviors.
Individual Behavior
We read online news; comment on posts, blogs, and videos; write reviews for products; post; like; share; tweet; rate; recommend; listen to music; and watch videos, among many other daily behaviors that we exhibit on social media. What are the types of individual behavior that leave a trace on social media?
We can generally categorize individual online behavior into three categories (shown in Figure 10.1):
1.User-User Behavior. This is the behavior individuals exhibit with respect to other individuals. For instance, when befriending someone, sending a message to another individual, playing games, following, inviting, blocking, subscribing, or chatting, we are demonstrating a user-user behavior.
This paper presents the design of a controller that allows a four-rotor helicopter to track a desired trajectory in 3D space. To this aim, a dynamic model obtained from Euler-Lagrange equations describes the robot. This model is represented by numerical methods, with which the control actions for the operation of the system are obtained. The proposed controller is simple and presents good performance in face of uncertainties in the model of the system to be controlled. Zero-convergence proof is included, and simulation results show a good performance of the control system.
In global localization under the framework of a particle filter, the acquiring of effective observations of the whole particle system will be greatly effected by the uncertainty of a prior-map, such as unspecific structures and noises. In this study, taking the uncertainty of the prior-map into account, a localizability-based action selection mechanism for mobile robots is proposed to accelerate the convergence of global localization. Localizability is defined to evaluate the observations according to the prior-map (probabilistic grid map) and observation (laser range-finder) models based on the Cramér-Rao Bound. The evaluation considers the uncertainty of the prior-map and does not need to extract any specific observation features. Essentially, localizability is the determinant of the inverse covariance matrix for localization. Specifically, at the beginning of every filtering step, the action, which makes the whole particle system to achieve the maximum localizability distinctness, is selected as the actual action. Then there will be the increased opportunities for accelerating the convergence of the particles, especially in the face of the prior-map with uncertainty. Additionally, the computational complexity of the proposed algorithm does not increase significantly, as the localizability is pre-cached off-line. In simulations, the proposed active algorithm is compared with the passive algorithm (i.e. global localization with the random robot actions) in environments with different degrees of uncertainty. In experiments, the effectiveness of the localizability is verified and then the comparative experiments are conducted based on an intelligent wheelchair platform in a real environment. Finally, the experimental results are compared and analyzed among the existing active algorithms. The results demonstrate that the proposed algorithm could accelerate the convergence of global localization and enhance the robustness against the system ambiguities, thereby reducing the failure probability of the convergence.
Most existing source search algorithms suffer from a high travel cost, and few of them have been analyzed in performance in noisy environments where local basins are presented. In this paper, the theseus gradient search (TGS) is proposed to effectively overcome local basins in search. Analytical performances of TGS and the gradient ascend with correlated random walk (GACRW), which is a variant of correlated random walk, are derived and compared. A gradient field model is proposed as an analytical tool that makes it feasible to analyze the performances. The analytical average searching costs of GACRW and TGS are obtained for the first time for this class of algorithms in the environments with local basins. The costs, expressed as functions of searching space size, local basin size, and local basin number are confirmed by simulation results. The performances of GACRW, TGS, and two chemotaxis algorithms are compared in the gradient field and a scenario of indoor radio source search in a hallway driven by real data of signal strengths. The results illustrate that GACRW and TGS are robust to noisy gradients and are more competitive than the chemotaxis-based algorithms in real applications. Both analytical and simulation results indicate that in the presence of local basins, TGS almost always costs the lowest.