Ebooks in machine learning

2 - A Gentle Start
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 13-21
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Let us begin our mathematical analysis by showing how successful learning can be achieved in a relatively simplified setting. Imagine you have just arrived in some small Pacific island. You soon find out that papayas are a significant ingredient in the local diet. However, you have never before tasted papayas. You have to learn how to predict whether a papaya you see in the market is tasty or not. First, you need to decide which features of a papaya your prediction should be based on. On the basis of your previous experience with other fruits, you decide to use two features: the papaya's color, ranging from dark green, through orange and red to dark brown, and the papaya's softness, ranging from rock hard to mushy. Your input for figuring out your prediction rule is a sample of papayas that you have examined for color and softness and then tasted and found out whether they were tasty or not. Let us analyze this task as a demonstration of the considerations involved in learning problems.
Our first step is to describe a formal model aimed to capture such learning tasks.
A FORMAL MODEL – THE STATISTICAL LEARNING FRAMEWORK
The learner's input: In the basic statistical learning setting, the learner has access to the following:
Domain set: An arbitrary set, χ. This is the set of objects that we may wish to label.

Contents
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp vii-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - The VC-Dimension
from Part 1 - Foundations
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 43-57
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

27 - Covering Numbers
from Part 4 - Advanced Theory
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 337-340
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

13 - Regularization and Stability
from Part 2 - From Theory to Algorithms
Shai Shalev-Shwartz, Hebrew University of Jerusalem, Shai Ben-David, University of Waterloo, Ontario
Book:

Understanding Machine Learning

Published online:

05 July 2014

Print publication:

19 May 2014, pp 137-149
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the previous chapter we introduced the families of convex-Lipschitz-bounded and convex-smooth-bounded learning problems. In this section we show that all learning problems in these two families are learnable. For some learning problems of this type it is possible to show that uniform convergence holds; hence they are learnable using the ERM rule. However, this is not true for all learning problems of this type. Yet, we will introduce another learning rule and will show that it learns all convex-Lipschitz-bounded and convex-smooth-bounded learning problems.
The new learning paradigm we introduce in this chapter is called Regularized Loss Minimization, or RLM for short. In RLM we minimize the sum of the empirical risk and a regularization function. Intuitively, the regularization function measures the complexity of hypotheses. Indeed, one interpretation of the regularization function is the structural risk minimization paradigm we discussed in Chapter 7. Another view of regularization is as a stabilizer of the learning algorithm. An algorithm is considered stable if a slight change of its input does not change its output much. We will formally define the notion of stability (what we mean by “slight change of input” and by “does not change much the output”) and prove its close relation to learnability. Finally, we will show that using the squared l2 norm as a regularization function stabilizes all convex-Lipschitz or convex-smooth learning problems. Hence, RLM can be used as a general learning rule for these families of learning problems.

Index
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 315-320
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Graph Essentials
from Part I - Essentials
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 13-50
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We live in a connected world in which networks are intertwined with our daily life. Networks of air and land transportation help us reach our destinations; critical infrastructure networks that distribute water and electricity are essential for our society and economy to function; and networks of communication help disseminate information at an unprecedented rate. Finally, our social interactions form social networks of friends, family, and colleagues. Social media attests to the growing body of these social networks in which individuals interact with one another through friendships, email, blogposts, buying similar products, and many other mechanisms.
Social media mining aims to make sense of these individuals embedded in networks. These connected networks can be conveniently represented using graphs. As an example, consider a set of individuals on a social networking site where we want to find the most influential individual. Each individual can be represented using a node (circle) and two individuals who know each other can be connected with an edge (line). In Figure 2.1, we showa set of seven individuals and their friendships. Consider a hypothetical social theory that states that “the more individuals you know, the more influential you are.” This theory in our graph translates to the individual with the maximum degree (the number of edges connected to its corresponding node) being the most influential person. Therefore, in this network Juan is the most influential individual because he knows four others, which is more than anyone else. This simple scenario is an instance of many problems that arise in social media, which can be solved by modeling the problem as a graph.

Frontmatter
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Information Diffusion in Social Media
from Part II - Communities and Interactions
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 179-214
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In February 2013, during the third quarter of Super Bowl XLVII, a power outage stopped the game for 34 minutes. Oreo, a sandwich cookie company, tweeted during the outage: “Power out? No Problem, You can still dunk it in the dark.” The tweet caught on almost immediately, reaching nearly 15,000 retweets and 20,000 likes on Facebook in less than two days. A simple tweet diffused into a large population of individuals. It helped the company gain fame with minimum cost in an environment where companies spent as much as $4 million to run a 30-second ad. This is an example of information diffusion.
Information diffusion is a field encompassing techniques from a plethora of sciences. In this chapter, we discuss methods from fields such as sociology, epidemiology, and ethnography, which can help social media mining. Our focus is on techniques that can model information diffusion.
Societies provide means for individuals to exchange information through various channels. For instance, people share knowledge with their immediate network (friends) or broadcast it via public media (TV, newspapers, etc.) throughout the society. Given this flow of information, different research fields have disparate views of what is an information diffusion process. We define information diffusion as the process by which a piece of information (knowledge) is spread and reaches individuals through interactions.

Part I - Essentials
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 11-12
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Community Analysis
from Part II - Communities and Interactions
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 141-178
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In November 2010, a team of Dutch law enforcement agents dismantled a community of 30 million infected computers across the globe that were sending more than 3.6 billion daily spam mails. These distributed networks of infected computers are called botnets. The community of computers in a botnet transmit spam or viruses across the web without their owner's permission. The members of a botnet are rarely known; however, it is vital to identify these botnet communities and analyze their behavior to enhance internet security. This is an example of community analysis. In this chapter, we discuss community analysis in social media.
Also known as groups, clusters, or cohesive subgroups, communities have been studied extensively in many fields and, in particular, the social sciences. In social media mining, analyzing communities is essential. Studying communities in social media is important for many reasons. First, individuals often form groups based on their interests, and when studying individuals, we are interested in identifying these groups. Consider the importance of finding groups with similar reading tastes by an online book seller for recommendation purposes. Second, groups provide a clear global view of user interactions, whereas a local-view of individual behavior is often noisy and ad hoc. Finally, some behaviors are only observable in a group setting and not on an individual level. This is because the individual's behavior can fluctuate, but group collective behavior is more robust to change. Consider the interactions between two opposing political groups on social media. Two individuals, one from each group, can hold similar opinions on a subject, but what is important is that their communities can exhibit opposing views on the same subject.

References
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 299-314
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp vii-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Notes
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 295-298
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Network Measures
from Part I - Essentials
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 51-79
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In February 2012, Kobe Bryant, the American basketball star, joined Chinese microblogging site Sina Weibo. Within a few hours, more than 100,000 followers joined his page, anxiously waiting for his first microblogging post on the site. The media considered the tremendous number of followers Kobe Bryant received as an indication of his popularity in China. In this case, the number of followers measured Bryant's popularity among Chinese social media users. In social media, we often face similar tasks in which measuring different structural properties of a social media network can help us better understand individuals embedded in it. Corresponding measures need to be designed for these tasks. This chapter discusses measures for social media networks.
When mining social media, a graph representation is often used. This graph shows friendships or user interactions in a social media network. Given this graph, some of the questions we aim to answer are as follows:
• Who are the central figures (influential individuals) in the network?
• What interaction patterns are common in friends?
• Who are the like-minded users and how can we find these similar individuals?
To answer these and similar questions, one first needs to define measures for quantifying centrality, level of interactions, and similarity, among other qualities. These measures take as input a graph representation of a social interaction, such as friendships (adjacency matrix), from which the measure value is computed.

Preface
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp xi-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We live in an age of big data. With hundreds of millions of people spending countless hours on social media to share, communicate, connect, interact, and create user-generated data at an unprecedented rate, social media has become one unique source of big data. This novel source of rich data provides unparalleled opportunities and great potential for research and development. Unfortunately, more data does not necessarily beget more good, only more of the right (or relevant) data that enables us to glean gems. Social media data differs from traditional data we are familiar with in data mining. Thus, new computational methods are needed to mine the data. Social media data is noisy, free-format, of varying length, and multimedia. Furthermore, social relations among the entities, or social networks, form an inseparable part of social media data; hence, it is important that social theories and research methods be employed with statistical and data mining methods. It is therefore a propitious time for social media mining.
Social media mining is a rapidly growing new field. It is an interdisciplinary field at the crossroad of disparate disciplines deeply rooted in computer science and social sciences. There are an active community and a large body of literature about social media. The fast-growing interests and intensifying need to harness social media data require research and the development of tools for finding insights from big social media data. This book is one of the intellectual efforts to answer the novel challenges of social media. It is designed to enable students, researchers, and practitioners to acquire fundamental concepts and algorithms for social media mining.

Part III - Applications
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 215-216
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Introduction
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 1-10
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

With the rise of social media, the web has become a vibrant and lively realm in which billions of individuals all around the globe interact, share, post, and conduct numerous daily activities. Information is collected, curated, and published by citizen journalists and simultaneously shared or consumed by thousands of individuals, who give spontaneous feedback. Social media enables us to be connected and interact with each other anywhere and anytime – allowing us to observe human behavior in an unprecedented scale with a new lens. This social media lens provides us with golden opportunities to understand individuals at scale and to mine human behavioral patterns otherwise impossible. As a byproduct, by understanding individuals better, we can design better computing systems tailored to individuals' needs that will serve them and society better. This new social media world has no geographical boundaries and incessantly churns out oceans of data. As a result, we are facing an exacerbated problem of big data – “drowning in data, but thirsty for knowledge.” Can data mining come to the rescue?
Unfortunately, social media data is significantly different from the traditional data that we are familiar with in data mining. Apart from enormous size, the mainly user-generated data is noisy and unstructured, with abundant social relations such as friendships and followers-followees. This new type of data mandates new computational data analysis approaches that can combine social theories with statistical and data mining methods. The pressing demand for new techniques ushers in and entails a new interdisciplinary field – social media mining.

5 - Data Mining Essentials
from Part I - Essentials
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp 105-138
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Mountains of raw data are generated daily by individuals on social media. Around 6 billion photos are uploaded monthly to Facebook, the blogosphere doubles every five months, 72 hours of video are uploaded every minute to YouTube, and there are more than 400 million daily tweets on Twitter. With this unprecedented rate of content generation, individuals are easily overwhelmed with data and find it difficult to discover content that is relevant to their interests. To overcome these challenges, we need tools that can analyze these massive unprocessed sources of data (i.e., raw data) and extract useful patterns from them. Examples of useful patterns in social media are those that describe online purchasing habits or individuals' website visit duration. Data mining provides the necessary tools for discovering patterns in data. This chapter outlines the general process for analyzing social media data and ways to use data mining algorithms in this process to extract actionable patterns from raw data.
The process of extracting useful patterns from raw data is known as Knowledge discovery in databases (KDD). It is illustrated in Figure 5.1. The KDD process takes raw data as input and provides statistically significant patterns found in the data (i.e., knowledge) as output. From the raw data, a subset is selected for processing and is denoted as target data. Target data is preprocessed to make it ready for analysis using data mining algorithm. Data mining is then performed on the preprocessed (and transformed) data to extract interesting patterns. The patterns are evaluated to ensure their validity and soundness and interpreted to provide insights into the data.

Acknowledgments
Reza Zafarani, Arizona State University, Mohammad Ali Abbasi, Arizona State University, Huan Liu, Arizona State University
Book:

Social Media Mining

Published online:

05 July 2014

Print publication:

28 April 2014, pp xv-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Ebooks in machine learning

Refine listing

Refine listing

Actions for selected content:

1437 results in Ebooks in machine learning

2 - A Gentle Start

Summary

Contents

6 - The VC-Dimension

27 - Covering Numbers

13 - Regularization and Stability

Summary

Index

2 - Graph Essentials

Summary

Frontmatter

7 - Information Diffusion in Social Media

Summary

Part I - Essentials

6 - Community Analysis

Summary

References

Contents

Notes

3 - Network Measures

Summary

Preface

Summary

Part III - Applications

1 - Introduction

Summary

5 - Data Mining Essentials

Summary

Acknowledgments

Ebooks in machine learning

Refine listing

Refine listing

Actions for selected content:

Save Search

1437 results in Ebooks in machine learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary