Constant-Factor Approximation Algorithms for Identifying ...

Constant-Factor Approximation Algorithms for Identifying ...

Constant-Factor Approximation Algorithms for Identifying Dynamic Communities Chayant Tantipathananandh with Tanya Berger-Wolf Social Networks These are snapshots and networks change over time Dynamic Networks t=1 1

2 3 t=2 1 2 3 5 4 5

2 3 4 1 4 1 5 2 5

5 4 2 3 5 2 5 4 2

t=2 4 1 3 1 Interactions occur in the form of disjoint groups Groups are not communities t=1

2 1 1 3 4 Aggregated network 2 3

1 Communities What is community? Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties. [Wasserman & Faust 1994] Dynamic Community Identification

GraphScope [Sun et al 2005] Metagroups [Berger-Wolf & Saia 2006] Dynamic Communities [TBK 2007] Clique Percolation [Palla et al 2007] FacetNet [Lin et al 2009] Bayesian approach [Yang et al 2009] Ship of Theseus from Wikipedia

The ship was preserved by the Athenians , for they took away the old planks as they decayed, putting in new and stronger timber in their place, insomuch that this ship became a standing example among the philosophers, for the logical question of things that grow; one side holding that the ship remained the same, and the other contending that it was not the same. [Plutarch, Theseus] Jeannot's knife has had its blade changed fifteen times and its handle fifteen times, but is still the same knife. [French story] Ship of Theseus Individual Cost for parts

changing never change identity identities Ship of Theseus Costs Identity for changes visiting and

to match beingthe absent group Approach Community = Color Valid coloring: In each time step, different groups have different colors.

Interpretation Group color: How does community c interact at time t? Interpretation 2 2 2 1 1

2 2 Individual color: Who belong to community c at time t? 1 1 1

Social Costs: Conservatism 2 2 2 2

2 2 2 2 2 2

Switching cost Absence cost 1 Visiting cost 2 Social Costs: Loyalty 3 3 1 2 1

3 1 1 2 3 1

1 1 1 Switching cost Absence cost 1 1 Visiting cost 2 Social Costs: Loyalty

2 2 2 3 2 3 2

2 Switching cost Absence cost 1 Visiting cost 2 Problem Complexity Minimizing total cost is hard NP-complete and APX-hard [with Berger-Wolf and Kempe 2007] Constant-Factor Approximation [details in paper]

Easy special case If no missing individuals and 2 2 , then simply weighted bipartite matching [details in paper] Group Graph Approximation via bipartite matching assume all individuals are observed at all time steps Greedy Approximation

No visiting or absence and minimizing switching time Greedy Approximation 3 4 2 maximizing path coverage 3

Greedy alg guarantees 7 2 No visiting or absence and minimizing switching max{2, 2/1, 4/2} in , 1, 2, independent of input size

3 4 time 3 Improvement by dynamic programming Southern Women Data Set [DGG 1941] 18 individuals, 14 time steps Collected in Natchez, MS, 1935

aggregated network Ethnography [DGG1941] Core Core note: columns not ordered by time Optimal Communities individuals

time ethnography Core Core all costs equal white circles = unknown Approximate Optimal

time time Core Core ethnography Core Core

Approximation Power 28 inds, 44 times 29 inds, 82 times 313 inds, 758 times Approximation Power 41 inds, 418 times 264 inds, 425 times

96 inds, 1577 times Conclusions Identity of objects that change over time (Ship of Theseus Paradox) Formulate an optimization problem Greedy approximation Fast Near-optimal Future Work Algorithm with guarantee not depending on , 1, 2

Network snapshots instead of disjoint groups Thank You NSF grant, KDD student travel award Jared Saia Chayant Mayank Lahiri David Kempe Arun Maiya

Ilya Fischoff Tanya Berger-Wolf Habiba Saad Sheikh Dan Rubenstein Siva Sundaresan Robert Grossman Anushka Anand Rajmonda Sulo

On the Bursty Evolution of Blogspace Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins IBM Almaden Research Center Blogspace Blogspace Collection of blogs with their links Motivation Sociological Different with traditional web page

Technical From static snapshot to dynamic graphs Background Web communities (Ravi Kumar,1999) groups of individuals who share a common interest characterized by dense directed bipartite subgraphs. Bursty communities of blogs Exhibit striking temporal characteristics Extract the community within a time interval

Time graph time graph G = (V,E) v in V has an associated duaration D(v) e in E is a triple (u, v, t) t is a time in interval D(u) D(v). prefix of G at time t Gt = (Vt,Et) Vt= {v in V | D(v) [0, t] } Et = {(u, v, t) in E| t t} Approach Two step approach Community extraction

Extract dense subgraphs( potential communities) Bust analysis analyze each dense subgraph to identfy and rank bursts in these communities. Community extraction Finding the densest subgraph: NP-hard Two steps: Pruning

Remove vertices of degree no more than one Vertices of degree two are K3 g Output and remove communities (pass a threshold) Repeat the 3 steps above Expanding Determines the vertex containing the most links Add it to the community If the links is larger than tk. Burst analysis Kleinbergs method (SIGKDD 2002)

model the generation of events by an automaton one of two states, low and high. high state is hypothesized as generating bursts of events. a cost is associated with any state transition to discourage short bursts. find a low cost state sequence that is likely to generate the stream. solves the problem of enumerating all the bursts by order of weight( dynamic programming) Tuning the algorithms Expansion in community extraction

Edges must grow to triangles; communities of size up to six will only grow vertices that link to all but one vertex; Communities of size up to nine will only grow vertices that link to all but two vertices; communities up to size 20 will grow only vertices that link to 70% of the community; larger communities will grow only vertices that link to at least 60% of the community Results

Recently Viewed Presentations

  • Proposal Training - Purdue University

    Proposal Training - Purdue University

    Proposal Training Proposal Basics, Tips & Tricks June 29, 2009 ... academic year and the PI will devote 9 months at 30% time/effort and 3 months summer term at 30% time/effort to the project, then 2.7 academic months and .9...
  • Insights from the 2009-2010 Post-Season ... - Carleton University

    Insights from the 2009-2010 Post-Season ... - Carleton University

    Carleton University has 181 awards to give out, 5 to international students. ... Faculty forward successful and reversion applications to FGPA for vetting. FGPA informs students via email of the status of their OGS application by mid February 2020.
  • Basic Tasks - West Virginia University

    Basic Tasks - West Virginia University

    For the purposes of this article, the following words shall have the meanings hereafter ascribed to them unless the context clearly indicates otherwise:(d) "Valuation commission" or "commission" means the commission created in section three of this article.(h) "Electronic" means relating...
  • Genetics & Heredity - Jackson School District

    Genetics & Heredity - Jackson School District

    Sometimes one gene can control multiple phenotypic traits in an individual. A classic example is the human disease PKU (phenylketonuria). This disease can cause mental retardation and reduced hair and skin pigmentation, and can be caused by any of a...
  • Learning Management Systems

    Learning Management Systems

    The primary purpose of Canvas at Rutgers is to enhance the student's learning experience. Acceptable use of Canvas is governed by this principle and determines the types of courses permissible on the system. Sites created in Canvas should have students...
  • A Metropolis - Andreoli's Classes: Window to Learning

    A Metropolis - Andreoli's Classes: Window to Learning

    What is a metropolis? What differentiates a metropolis from other large cities is mostly the power that concentrates there. This power comes from the size of its population. Not only are . services. concentrated in a metropolis but also the...
  • Uniform Convergence of the Generalized Scharfetter-Gummel ...

    Uniform Convergence of the Generalized Scharfetter-Gummel ...

    A Poisson-Fermi Theory for Biological Ion Channels: Models and Methods Jinn-Liang Liu National Hsinchu University of Education In collaboration with Bob Eisenberg Rush University Medical Center, Chicago, USA Dec. 25, 2013 * *
  • National Technical Information Service (NTIS) (NTIS NTIS Handle

    National Technical Information Service (NTIS) (NTIS NTIS Handle

    National Technical Information Service (NTIS) NTIS' Handle System® and the NTIS e-Government Virtual Library Initiative ... & cataloged in the NTIS Database NTIS e-Government Virtual Library Initiative Provide single point of entry to search more than 750,000 reports in the...