The Subspace Enumeration Method

In this post, we briefly explain the main idea of [CDKP23], which uses a novel spectral method named "subspace enumeration" to solve the extensively studied task of approximately counting the number of independent sets on a given bipartite graph G.

The purpose of this post is only to sketch the proof (and its novel idea) in an easy-to-understand way. For more details, please refer to the original paper.


Overview

The problem setting we will discuss about in this post is d-regular (and connected WLOG) bipartite graph G=(X,Y,E) where d=δn.

The d-regular condition is often required in approximate counting problems, and here we will additionally utilize its spectral properties.

First we describe some common approaches towards counting the number of independent sets i(G) on G. For a possible independent set I, enumerating its X-part IX:=IX gives i(G)=AX2|YN(A)|=2|Y|AX2|N(A)|.

Since there might be many subsets AX having the same neighborhood, it seems a more efficient way to enumerate BY and compute the contribution of those having B as (exact) neighborhood together. Denote the collection of these subsets by NB, then NB, is a join-semilattice, implying there exists a greatest element ABNB s.t. every AX with N(A)=B is a subset of AB. We call such subsets closed, and the minimum closed subset containing A the closure of A, denoted by [A]. Now we enumerate closed subsets (of X) instead of subsets of Y, which may contain many "useless" ones (i.e. NB=): i(G)=2|Y|AX closedDA2|N(A)|,DA:=|NN(A)|.

Trying to proceed further, we will soon find that two vertices x,yA may be "correlated" (i.e., have common neighbors), which leads to a difference between |N(A)| and uA|N(u)|. A natural attempt to resolve the term |N(A)| is to consider sets of vertices with no correlation between separately. Formally, we say two vertices u,vV are 2-linked if wV:uwv. A 2-linked subset AX is called a polymer, and polymers are compatible if they are not 2-linked (i.e., not correlated). Let P(X) be all polymers in 2X. With this characterization of correlated vertices, we can uniquely partition A into compatible polymers (also abbreviated as components in below, denoted by K(A)) A1,,Ak s.t. |N(A)|=i[k]|N(Ai)|. Observing that A is closed iff all its components are closed, i(G)=2|Y|AX withall components closedDA2|N(A)|=2|Y|k0{A1,,Ak}P(X)compatible and all closedi[k]DAi2|N(Ai)|,DA:=|NN(A)|=KK(A)|NN(K)|.

A standard step then is to make a discussion on the expansion of (the components of) A. We call a subset AX t-expanding if |N(A)|=|[A]|+t, and correspondingly t-contracting if |N(A)|<|[A]|+t. The subsets with stronger expansion tend to have (exponentially) smaller weights, so we can only enumerate the t0-contracting polymers (which contributes the main part), then approximate the sum of t0-expanding polymers' weights. With the "threshold parameter" t0 properly picked to ensure a provably reasonable deduction, i(G)=2|Y|AX withall components closedand t0-contractingDA2|N(A)|ΞA,ΞA:=k0{B1,,Bk}P(XN2(A))compatible and all not t0-contracting2i[k]|N(Bi)|.

With an additional notation A(G) denoting the collection of all AX with all components closed and t0-contracting, we have the final form i(G)=2YAA(G)DA2|N(A)|ΞA.

Now the task of computing i(G) splits into three parts:

  1. enumerate AA(G),
  2. (approximately) compute DA for all possible AA(G),
  3. (approximately) compute ΞA for all possible AA(G).

The last two parts are mostly identical to that in [JPP21] and do not introduce much novelty, so here we omit the details (maybe I will write notes about [JPP21] and complete these parts in the future) and focus on the first one. To enumerate AA(G) efficiently, the subspace enumeration technique is introduced. As a brief outline, the proof consists of three pieces:

  1. every AA(G) corresponds to a small cut,
  2. a collection Ccut of "principal" cuts can be efficiently constructed using subspace enumeration s.t. every small cut is close to some "principal" cut,
  3. given a "principal" cut CCcut, we can enumerate those small cuts being close to C.

By the term cut we refer to a subset CV=XY, with its value |(C)|:=|(C×(VC))E| defined as the number of edges with precisely one endpoint in C.

The first part holds naturally by setting cut SA=AN(A), where |(SA)|td since every AA(G) is t-contracting.

The Subspace Enumeration Method

The second part starts from a spectral observation: |(C)| can be written as cTLc, where c denotes the indicator vector of C with ground set V, and L=dIA denotes the Laplacian matrix of G. For a d-regular G, Tr(A2)=Nd=i[N]λi2, thus by an averaging argument, at most 4N/d eigenvalues of A have absolute value d/2. By the symmetry of the spectrum of A about 0, at most 4n/d eigenvalues of L are d/2. We denote the precise amount by k, and let E={e1,,eN} be an orthonormal basis of eigenvectors of L with corresponding eigenvalues sorted: 0=μ1μkd2<μk+1μNk<3d2μNk+1μN=2d.

Abusing notations slightly, for a set (or cut) labeled by a capital letter, we use the same letter in lowercase for its indicator vector and its coordinates over E (e.g., s=i[N]siei is the indicator vector of set S). Let U=span({e1,,ek}), intuitively for every small cut S with |(S)|td, s is close to its projection u onto U. (Here we take d as a "unit" of measure because G is d-regular, and of course t1 since even the smallest cut of G will have value d.) To be precise, td|(S)|=sTLs=i[N]μisi2d2i=k+1Nsi2=d2su22, which implies su22t.
Now the goal becomes constructing a collection of cuts "dominating" U. In [CDKP23], a brute-force method is chosen since (WLOG) a cut S has L2-norm s2n: taking an ϵ-net E:={p=i[k]xiei:x1,,xkϵkZ,p2n} is acceptable by bounding its size with |E|(2nk/ϵ)k=nO(1/δ). But it's not sufficient since the lattice points in E may be not an indicator vector (i.e. 0-1 vector). Fortunately, simply rounding them to the nearest 0-1 vector (breaking ties arbitrarily is OK, and we choose 1/21 for consistency) will not introduce much error: suppose the nearest lattice point to u is p and p is rounded to c, we know c is closer to p than s (noticing that s is also an 0-1 vector), thus cp2sp22t+ϵ by the triangle inequality, and furthur cs222t+2ϵ. Let C be the cut corresponding to c and pick ϵ=2, |SC|=sc228(t+1)232t. Also C is a small cut: |(C)||(S)|+d|SC|td+32td=33td.

Here we finsh the second part and state the lemma:

Lemma 2.1 ([CDKP23], 5). Let G=(V,E) be a d-regular bipartite graph on N=2n vertices. There is a set Ccut2V s.t.

  1. |Ccut|nO(1/δ),
  2. t1,SV:(|(S)|tdCCcut:|SC|32t|(C)|33td).

Moreover, Ccut can be constructed in time nO(1/δ).

This lemma is somehow similar to the graph container method: set of representatives "dominating" all objects with certain properties exists and can even be constructed.

Bounded (and Algorithmic) Dominance of "Principal" Cuts

In this section we show the third part. Let CX:=CX be the X-part of "principal" cut C and CY likewise, our goal is to enumerate all small cuts SA corresponding to some AA(G) and is "dominated" by C (in particular, |SAC|32t as stated in Lemma 2.1). For convience we relax the condition and get max(|ACX|,|N(A)CY|)32t.

A straightforward observation is that every vertex in CY having >32t neighbors inside CX is in N(A) (more precisely, N(AX) where AX:=ACX), thus we can reduce the range of A by enumerating subsets SY of SY:={vCY:|N(v)CX|32t} and exclude N(SY) from CX.

Now we have a restriction on AX. Moreover, since |(C)| is small, the number of neighbors of AX outside CY is upper bounded, and |N(AX)CY| is lower bounded: |N(AX)CY||AX||(C)|d|A||ACX||(C)|d=|A|64t For those possible uACX, if N(u)CY is large, then the disjoint union (N(u)CY)(N(AX)CY) is also provably large. A bound is therefore implied by the t-contracting property of A: |A|+t|N(A)||(N(u)CY)(N(AX)CY)||N(u)CY|+|A|64t, that is, |N(u)CY|65t.

Conclude the two parts, we can enumerate A:

  1. inside CX, we enumerate SYSY and set AX=CXN(SY); (if extra vertices are excluded, then either CYSY are not completely covered, or it's the case enumerating other values of SY.)
  2. outside CX, we enumerate SXSX:={uXCX:|N(u)CY|65t} and set A=AXSX.

The number of such A's is thus bounded by 2|SX|+|SY|. Noticing that the vertices in the two sets all have lower bounded contributions to |(C)|, we can hence bound the amount by choosing t suitably: 33td|(C)||SX|(d65t)+|SY|(d32t)d2(|SX|+|SY|) where we choose t<d/130 to derive the last inequality.
Finally we have 2|SX|+|SY|266t and state the following lemma with more general parameters:

Lemma 2.2 (modified from [CDKP23], 6). Fix any c,t1. Given a cut C with |(C)|(c+1)td, the number of closed t-contracting subsets AX s.t. max(|ACX|,|N(A)CY|)ct is at most 2c+11(2c+1)t/dt. Moreover, these sets can be enumerated in time 2c+11(2c+1)t/dtnO(1).

By picking td/(4c+2) we have the bound 22(c+1)t, and inserting c=32 results in the bound we need.

Combining the three parts shown above, we reach the end:

Lemma 2.3 ([CDKP23], 2). For any d-regular bipartite graph G=(X,Y,E) and t028d, the set A(G) has size nO(1/δ)266t0 and can be enumerated in the same time.

References

  • [CDKP23] Charlie Carlson, Ewan Davies, Alexandra Kolla, and Aditya Potukuchi. Approximately Counting Independent Sets in Dense Bipartite Graphs via Subspace Enumeration. SODA 2023, [CoRR abs/2307.09533]
  • [JPP21] M. Jenssen, W. Perkins, and A. Potukuchi. Approximately counting independent sets in bipartite graphs via graph containers. Random Structures & Algorithms, 63(1):215–241, 2023. doi:10.1002/rsa.21145, [CoRR abs/2109.03744].