Title: | Predict Information Cascade by Self-Exciting Point Process |
---|---|
Description: | An implementation of self-exciting point process model for information cascades, which occurs when many people engage in the same acts after observing the actions of others (e.g. post resharings on Facebook or Twitter). It provides functions to estimate the infectiousness of an information cascade and predict its popularity given the observed history. See http://snap.stanford.edu/seismic/ for more information and datasets. |
Authors: | Hera He, Murat Erdogdu, Qingyuan Zhao |
Maintainer: | Qingyuan Zhao <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2024-10-27 05:34:34 UTC |
Source: | https://github.com/qingyuanzhao/seismic |
Estimate the infectiousness of an information cascade
get.infectiousness( share.time, degree, p.time, max.window = 2 * 60 * 60, min.window = 300, min.count = 5 )
get.infectiousness( share.time, degree, p.time, max.window = 2 * 60 * 60, min.window = 300, min.count = 5 )
share.time |
observed resharing times, sorted, share.time[1] =0 |
degree |
observed node degrees |
p.time |
equally spaced vector of time to estimate the infectiousness, p.time[1]=0 |
max.window |
maximum span of the locally weight kernel |
min.window |
minimum span of the locally weight kernel |
min.count |
the minimum number of resharings included in the window |
Use a triangular kernel with shape changing over time. At time p.time, use a triangluer kernel with slope = min(max(1/(p.time
/2), 1/min.window
), max.window
).
a list of three vectors:
infectiousness. the estimated infectiousness
p.up. the upper 95 percent approximate confidence interval
p.low. the lower 95 percent approximate confidence interval
data(tweet) pred.time <- seq(0, 6 * 60 * 60, by = 60) infectiousness <- get.infectiousness(tweet[, 1], tweet[, 2], pred.time) plot(pred.time, infectiousness$infectiousness)
data(tweet) pred.time <- seq(0, 6 * 60 * 60, by = 60) infectiousness <- get.infectiousness(tweet[, 1], tweet[, 2], pred.time) plot(pred.time, infectiousness$infectiousness)
Predict the popularity of information cascade
pred.cascade( p.time, infectiousness, share.time, degree, n.star = 100, features.return = FALSE )
pred.cascade( p.time, infectiousness, share.time, degree, n.star = 100, features.return = FALSE )
p.time |
equally spaced vector of time to estimate the infectiousness, p.time[1]=0 |
infectiousness |
a vector of estimated infectiousness, returned by |
share.time |
observed resharing times, sorted, share.time[1] =0 |
degree |
observed node degrees |
n.star |
the average node degree in the social network |
features.return |
if TRUE, returns a matrix of features to be used to further calibrate the prediction |
a vector of predicted populatiry at each time in p.time
.
data(tweet) pred.time <- seq(0, 6 * 60 * 60, by = 60) infectiousness <- get.infectiousness(tweet[, 1], tweet[, 2], pred.time) pred <- pred.cascade(pred.time, infectiousness$infectiousness, tweet[, 1], tweet[, 2], n.star = 100) plot(pred.time, pred)
data(tweet) pred.time <- seq(0, 6 * 60 * 60, by = 60) infectiousness <- get.infectiousness(tweet[, 1], tweet[, 2], pred.time) pred <- pred.cascade(pred.time, infectiousness$infectiousness, tweet[, 1], tweet[, 2], n.star = 100) plot(pred.time, pred)
This package implements a self-exciting point process model for information cascades. An information cascade occurs when many people engage in the same acts after observing the actions of others. Typical examples are post/photo resharings on Facebook and retweets on Twitter. The package provides functions to estimate the infectiousness of an information cascade and predict its popularity given the observed history. For more information, see http://snap.stanford.edu/seismic/.
SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity by Q. Zhao, M. Erdogdu, H. He, A. Rajaraman, J. Leskovec, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2015.
A dataset containing all the (relative) resharing time and node degree of a tweet. The original Twitter ID is 127001313513967616.
A data frame with 15563 rows and 2 columns
relative_time_second. resharing time in seconds
number_of_followers. number of followers