Factoring Cartesian bitwise join of bit vectors

Question

(This question has been substantially revised in an attempt to word it clearly.)

I am wondering if anyone has seen this problem. Let $[n] = \{1,\ldots,n\}$ for an integer $n$. Consider two finite multisets of bitstrings, $$ A=\{a_i\}_{i\in[r]} \quad\text{and} \quad B=\{b_j\}_{j\in[s]} \;,$$ where each $a_i \in \{0, 1\}^n$ and each $b_j \in \{0, 1\}^n$. We may compute from this a "Cartesian bitwise join" $C$, which is also a multiset of bitstrings $C= \{c_{i,j}\}_{(i,j) \in [r] \times [s]}$ for size $|C| = |A| \cdot |B| = rs$, where $$ c_{i,j} = a_i \vee b_j$$ where "$\vee$" is bitwise-OR.

Question. If you are given such a multi-set $C$, without any particular indexing, can you find multi-sets $A$ and $B$ of which $C$ is the Cartesian bit-wise join? Equivalently: given a sequence $C = (c_k)_{k\in[m]}$ consisting of bitstrings in $\{0,1\}^n$, can you find sequences $A = (a_i)_{i\in[r]}$ and $B = (b_j)_{j\in[s]}$ such that $m = rs$, and a bijection $\varphi: [r] \times [s] \to [m]$, such that $$c_{\varphi(i,j)} = a_i \vee b_j$$ for all $(i,j) \in [r] \times [s]$?

This problem is equivalent to the following problem. Consider the ring $R$ whose elements are strings in $\{0,1\}^n$, where addition is done bit-wise by the operation $x + y = x \oplus y \oplus 1$ (where $\oplus$ is parity) with $1^n$ being the identity, and multiplication is the bit-wise OR with $0^n$ being the identity. For some particular ordering of the elements of $A$, we may present $A$ as an element of $R^r$, and similarly present $B$ as an element of $R^s$. The Cartesian bitwise join corresponds to the outer product of $A$ and $B$, $$ C \;=\; A B^\top \;=\; \begin{bmatrix} a_1 \vee b_1 & a_1 \vee b_2 &\cdots& a_1 \vee b_s \\ a_2 \vee b_1 & a_2 \vee b_2 & \cdots & a_2 \vee b_s \\ \vdots & \vdots & & \vdots \\ a_r \vee b_1 & a_r \vee b_2 & \cdots & a_r \vee b_s \end{bmatrix}\;,$$ except interpreted as a multiset rather than a matrix (that is, the elements of $C$ are not assigned to particular coefficients of an $r \times s$ matrix). Given a multiset $C$ with composite cardinality $m \in \mathbb N$, can its elements be mapped to coefficients of an $r \times s$ array (such that $m = rs$) which is an outer product of the above form?

(By interchanging 0s and 1s in all of the bitstrings, we may replace the bit-wise OR "$\vee$" with the bit-wise AND "$\wedge$", in which case we may take $R$ to be the ring whose elements are strings in $\{0,1\}^n$, where addition is as vectors over $\mathbb Z_2$, and multiplication is the bit-wise AND. )

One may consider additional constraints on this problem, such as minimum Hamming weight of the bitstrings in $A$ and $B$ (in the Cartesian join formulation). We certainly require that $A,B \ne \{0^n\}$ to prevent a trivial solution.

This problem arises in an EE & complexity theory context. I am looking for other cases or analysis of it.
The problem is somewhat analogous to factoring but does not seem to have a unique factorization.
Note that bitvectors can be naturally modelled as hypergraphs so maybe something from that theory is applicable. I did research hypergraph products but that theory does not seem to apply directly. The problem translates to finding two hypergraphs $H'$ and $H''$ given a hypergraph $H$, such that the $H$ can be formed by taking all pairwise unions of edges in $H'$ and $H''$.

Why is $|C| = |A| \cdot |B|$ ? I can see $\le$, but not $=$. In other words, are the sizes of the desired $A, B$ also given ? — Suresh Venkat, Oct 15 '12 at 22:31
that reminds me. in some cases the combinations $a_i \vee b_j$ can give the same bitvectors in $C$. am looking for any nontrivial analysis on the problem, pref from the literature, misc assumptions are ok, just looking for some general picture to start.... one immed observation is that the sets could/can be of size $x \cdot y$ where $x, y$ are factors of $|C|$ — vzn, Oct 15 '12 at 22:41
@Suresh $|C| = |A| \cdot |B|$ by definition. The (formal) sets $A$,$B$ are known in advance. Here $|\cdot|$ is not Hamming weight, just length. — Yuval Filmus, Oct 15 '12 at 22:45
am considering both "directions" of this problem but mainly the "reverse" direction where $C$ is given but $A,B$ are not given & are to be determined. yes sorry for any ambiguity, $|A|$ above is just the number of bitvectors in the set. — vzn, Oct 15 '12 at 22:50
This is a very odd definition. So $C$ is a two-dimensional matrix which is serialized as a one-dimensional vector, and we don't know its dimensions, only the size of the vector. Is that correct? Where does this definition arise? — Yuval Filmus, Oct 16 '12 at 02:54
isn't the question equivalent to, for collections of sets $\mathcal{S} = {S_1, \ldots, S_n}$ and $\mathcal{T} = {T_1, \ldots, T_m}$, find $\mathcal{S}$ and $\mathcal{T}$ given $\mathcal{S} \cap \mathcal{T} = {S_i \cap T_j}$? almost always the solution is not unique, and also there is always a trivial solution where $\mathcal{T} = {U}$ where $U$ is the universe of possible elements. what are you looking for? — Sasho Nikolov, Oct 16 '12 at 02:54
if the analogy is with factoring, are you looking for any non-trivial solution to the above problem? — Sasho Nikolov, Oct 16 '12 at 02:56
$C$ is given! its full dimensions and contents are known/given. yes it can be seen as a 2d matrix with arbitrary row or column reordering allowed. the "horizontal dimension" (length) of all the bitvectors is given as the same as that of $C$. — vzn, Oct 16 '12 at 03:24
it may seem odd but its actually quite "central" in some key complexity theory proofs. yes exactly, collections of sets or equivalently, hypergraphs. each $a_i$ and $b_j$ is a bitvector. will accept any meaningful ref incl reduction to known problems etc... maybe more precisely/formally, $C = {c: c= a_i \vee b_j, i < |A|, j < |B|}$... if there is enough interest (say maybe 5 upvotes or more/meaningful answers) will post an example generated by ruby code & some random sets. dont want to reveal "punchline" early on to try to generate independent ideas. will reveal eventually. — vzn, Oct 16 '12 at 03:34
what about the fact that there is always a trivial solution or that the solution is not unique? — Sasho Nikolov, Oct 16 '12 at 03:49
would like some way to characterize the "nontrivial" solutions or formulate the problem such that trivial solutions are excluded.... nontrivial is not yet precisely defined... inexact solutions might be interesting but the exact case is the important case... have an interesting conjecture on the nontrivial factoring but it wont make sense until "trivial" is defined well... hint, it is not too hard to (P-time) reduce it all to a big SAT problem but that doesnt seem to directly help to reveal the inner structure.... — vzn, Oct 16 '12 at 14:57
Vote to close, I still cannot understand the question. Please use more standard terminology, and phrase the question clearly. — Yuval Filmus, Oct 16 '12 at 15:02
@yuval there is more than one way to phrase it. sorry the notation is not clear & that it wasnt perfect on 1st shot. it is a well defined problem but maybe not written well however think that people are also not reading it carefully. open to suggestions on formulation. have clarified it in the comments. — vzn, Oct 16 '12 at 15:23
clarification, here is more on "bitwise or" of two bitvectors — vzn, Oct 16 '12 at 15:49
i ventured to clarify, hope i kept the original intention. i would not normally do such a thorough re-edit, but vzn is not usually one to respond to requests for edits. — Sasho Nikolov, Oct 16 '12 at 16:45
was on the verge of editing but had no specific suggestions so far. agreed your edit indeed improves clarity & preserves the intent. except the idea of defining a/the "trivial case". maybe closer but does not seem to work right. $1^n$ bitwise or'd with any vector is $1^n$. did you mean $0^n$? that works right? — vzn, Oct 16 '12 at 20:06
... Repeating a (now deleted because of meta-discussion) comment — what are these deep proofs in complexity theory where you find that this operation occurs? Assuming my understanding of your problem is correct, it doesn't surprise me that it may play a role in some proofs, as outer products are a natural operation; but it would be nice to know your motivations for this problem, unless it is essentially an exercise in pure algebraic logic. — Niel de Beaudrap, Oct 17 '12 at 18:32
note to audience— lots of prior meta comments deleted by mutual agreement although its against my style/pref to do that. unf, apparently got ~3 downvotes on question after the diagram was added & on subsequent edits, dont know the reason. may add my own analysis to the problem as answer at some pt if there are more upvotes but for now, seems like audience is not too enthusiastic for the topic right now & dont want to push against voting. but still think problem is very important & would like to continue discussion in chat or elsewhere if someone initiates it. planning on revisiting at some pt. — vzn, Oct 17 '12 at 20:56
@niel (1) dont really care about the distinction between multisets or sets for this problem & think it simplifies it to just have sets (2) not following why you defined addition in the ring & it doesnt seem to be used elsewhere...? (3) like the $A B^T$ idea a lot & makes me think/wonder there could be some connection to matrix SVD...? — vzn, Oct 18 '12 at 01:04
(1) Multisets are necessary if you want $|C| = |A|\cdot|B|$. If not, then as per Suresh's initial comment, you can get $|C| \leqslant |A|\cdot|B|$ (if all the elements of $A$ and $B$ are distinct but give rise to repeated elements in $C$, which we only count once each) or even $|C| \geqslant |A|\cdot|B|$ (in case $A$ and $B$ do have repeated elements, which we use in different combinations to create multiple possibly distinct elements of $C$). If you want any particular relation between the sizes of the collections at all, definiteness is essential. — Niel de Beaudrap, Oct 18 '12 at 01:13
(2) I defined it as a ring only in order for the linear algebra to be well-defined, and in particular to avoid complaints about whether or not my presentation of $A$ and $B$ are "vectors". You are correct that for an outer product it's irrelevant. (3) You should be warned that singular value decompositions are a property of real-valued matrices. If you want to play with those, you do have to care about the addition operation; not to mention the fact that the coefficients are bitstrings and not real numbers. If you substitute bitstrings with diagonal matrices, maybe you get somewhere. — Niel de Beaudrap, Oct 18 '12 at 01:17
... Whether you deal with multisets, or whether or not the formulation in terms of outer products is likely to be informative (SVD or no), will tend to depend on what this operation is used for analytically. Are you prepared yet to describe what role you have seen it play, or think that it must play, in complexity theory? Otherwise, this may be better suited for Math.SE than here, as this is basically just playing around with structures without yet any idea even of whether the problem is easier or harder than factorization. — Niel de Beaudrap, Oct 18 '12 at 01:19
@niel its in monotone circuit proofs.. planning on writing up more details in a blog.. — vzn, Oct 19 '12 at 22:17
The current formulation admits the trivial solution $r=m, s=1$. Even this does not always lead to a unique factorisation: the singleton factor can be any bit vector that is dominated by each of the vectors in the other factor. Should the question not require nontrivial factors? — András Salamon, Oct 20 '12 at 17:27
yes it is simple to "factor out" any $1$ bit from $C$ (ie every element of $C$ has its $n$'th bit as $1$, isnt that the case you are referring to? so yes maybe the problem should point out there are no $1$ bits common to $n$'th bits of all elements of $C$. — vzn, Oct 20 '12 at 22:18
@vzn: Your condition seems sensible, and would stop multiple trivial factorisations. However, there is still the trivial solution $A=C$ and $B$ containing a single bit vector which is all zeroes. — András Salamon, Oct 21 '12 at 23:40
@niel as promised more details re the connection to monotone circuits. basically it is intrinsic to determining exact error due to approximations in lower bounds monotone circuit approximation proofs although this is not yet widely recognized. feel free to rip it into tiny little pieces =) — vzn, Dec 10 '12 at 17:05
@András this operation is "hidden" also in eg the Rossman paper on monotone circuit lower bounds which youve helpfully cited in various places eg tcs.se & your blog (have been focusing some on that paper while developing this) — vzn, Dec 10 '12 at 17:09
@sasho fyi actually note 1st rev was simplification (of the real problem) that assumed that the grid is rectangular and $|A|,|B|$ could be found by factoring the size of the multiset $|C|$. in real problem as suresh pointed out, the multisets and repeated "joins" mean that $|A|,|B|$ will only be "near" factors of $|C|$. last rev above by NdB still doesnt quite reflect that. original motivation is as a key operator in monotone circuit decomposition. wonder what you think — vzn, Dec 12 '12 at 20:01

Niel de Beaudrap · Accepted Answer · 2012-10-20T23:42:04.250

Passing to the dual hypergraphs to look for products.

As you mention in the comments, we may interpret the bitstrings contained in $A$, $B$, and $C$ as edges in hypergraphs.

Vertices are the elements of $[n]$;
An edge $x \in \{0,1\}^n$ contains a vertex $v \in [n]$ if $x_v = 1$.

(Because these are multi-sets, they're actually multi-hyper-graphs. We may consider the edges to have some generic labels to distinguish them.) We may consider the dual hypergraph $H$ in which the vertices and the edges are interchanged; that is, where

The (labelled) bitstrings $x \in \{0,1\}^n$ now distinct vertices, and
A bitstring $x \in \{0,1\}^n$ belongs to an edge represented (that is, labelled) by $e \in [n]$ if and only if $x_e =1$;

— that is, a bitstring $x \in \{0,1\}^n$ describes all of the edges which are incident to it; of which there are always $n$ in this graphical model, albeit some of them may be the "empty edge". This puts us in a much better position to describe the hypergraph as a product in some conventional manner. The hypergraph $H_C$ obtained from a multiset $C$ now has a composite number of vertices $m$: we would like hypergraphs $H_A$ and $H_B$ such that $V(H_C)$ has some particular bijective correspondance with $V(H_A) \times V(H_B)$.

A diagonally restricted tensor product of hypergraphs.

Consider the alternative formulation of the problem, replacing bit-wise OR with bit-wise AND, alluded to in the recent revision of the problem. (To do this, we take the bitwise complement of all of the bitstrings.) The Cartesian bit-wise meet condition (rather than "join", as we're using "$\wedge$" now in place of "$\vee$") corresponds to the existence of a particular bijection $$\varphi: V(H_A) \times V(H_B) \to V(H_C)$$ such that the edge labelled $e \in [n]$ in $H_C$ contains only the vertices $\varphi(a,b) \in V(H_C)$ such that both $a \in V(H_A)$ and $b \in V(H_B)$ are contained in the respective edges labelled by $e$ in $H_A$ and $H_B$. This seems somewhat different from most graph products, in that we do not take combinations of arbitrary edges, but only those with consistent labels.

If we suspended this label consistency condition, the join construction would correspond to a hypergraph tensor product (generalizing the tensor product of graphs, also commonly known as the direct product in the literature), in which we define edge-pairs $(e,f) \in E(H_A) \times E(H_B)$ which contain all vertices $(a,b) \in V(H_A) \times V(H_B)$ such that $a \in e$ and $b \in f$. This glosses over some issues to do with edges which contain $(a,b)$ and $(a,b')$ for a fixed co-ordinate $a$; but the construction we consider is appropriate to hypergraphs, taking the tensor product of the incidence matrices (characteristic functions of the edges, ranging over the vertices) to form the incidence matrix of $H_A \otimes H_B$.
To re-obtain the Cartesian bit-wise join of $H_A$ and $H_B$, we must restrict to the diagonal edges $(e,e) \in E(H_A) \times E(H_B) = [n] \times [n]$, which we re-label by $(e,e) \mapsto e$. Note that $H_A$ and $H_B$ are edge-labelled hypergraphs, as their edges are non-negotiably labelled by the elements of $[n]$ from the formulation of the vertices as bitstrings with a fixed order. (Of course, when we are searching for hypergraphs $H_A$ and $H_B$, re-labelling the edges simply corresponds to considering a different hypergraph.) Selecting the diagonal is a plausible restriction, if somewhat unnatural in the context of graph products.

The resulting reformualtion.

From this, asking "does $C$ arise from the Cartesian bitwise join of $A$ and $B$?" is equivalent to asking:

Is an edge-labelled hypergraph $H_C$ with $m$ vertices and at msot $n$ edges, with distinct edge-labels selected from $[n]$ isomorphic to the diagonal subgraph (i.e. taking only the diagonally labelled edges) of the tensor product $H_A \otimes H_B$ of two labelled hypergraphs $H_A$ and $H_B$, which also have at most $n$ edges each and also have edges labels over $[n]$?

I'm not sure if you're going to come closer to a natural graph product formulation of than this. I rather suspect that if this does play a deep role in computational complexity that someone has studied something like this; but without knowing what areas of complexity theory it is pertinent to, I would not be able to tell you where to look, beyond perhaps investigating tensor products of hypergraphs.

+1 for considering the dual hypergraph which was just musing minutes ago might work & looks promising, am suspecting might make the hypergraph product theory (some good refs on that) relevant. found suresh's ref to the dual hypergraph based on reading your Q/A list & your question on hypergraph coloring. hope to have more comments after further review.... — vzn, Oct 17 '12 at 18:53
still have to dig into this but this is a ref turned up on hypergraph products earlier (before posting question) which has a somewhat new yet somewhat fleshed out theory incl a prime factorization thm! what do you think, does any of this fit? hypergraph products by gringmann .. note in all cases (there are a bunch) it uses a cartesian product of vertex sets. — vzn, Oct 17 '12 at 20:58
@vzn: I think that in just about any (hyper)graph product of note, the vertex set will be the cartesian product of the vertex sets of the factors; that was my motivation for considering the dual hypergraph originally. As to unique "prime" factorization, I think you must look for products which lack that property, as the Cartesian bitwise meet/join certainly lack it. (Consider $A_1 = {110,001}$ and $B_1 = {011,101}$ versus $A_2 = {111,000}$ and $B_2 = {011,101}$; both yield a join of $$C = \begin{bmatrix} 111 & 111 \ 011 & 101 \end{bmatrix}$$ expressed as an outer product.) — Niel de Beaudrap, Oct 17 '12 at 21:35
as written on your now-deleted answer, am actually interested in the question/case of $C$ that are provably "prime" wrt factorization. conjecture that "important" $C$ exist that are "prime" wrt "cartesian bitwise join" factoring — vzn, Oct 17 '12 at 22:47
My post didn't provide any sort of uniqueness guarantees. To "factor" the matrix $C$ in my previous comment, one would take as factors the bit-wise AND across the rows and columns, obtaining $A = {111,001}$ and $B = {011, 101}$, which you might notice is yet a third factorization. (I expect that it is an accident of the example that I've chosen that the row-vector factor is the same in each one.) And this is just with a single mapping of elements to coefficients; for larger $C$ there may be feasible factorizations for different mappings. It's not clear that you can prevent this in any way. — Niel de Beaudrap, Oct 18 '12 at 00:34
@vzn: I don't think it's a "product" in its own right; my aim was to relate it as closely to some product (e.g. the tensor product) as I could. In order to be a product, it should do things such as commute over disjoint, disconnected unions of hypergraphs; but because the product is only defined for pairs of hypergraphs with the same number of edges, it is not clear how such a property could hold. Not to mention, come to think of it, that one cannot take the join of an arbitrary pair of hypergraphs. — Niel de Beaudrap, Oct 18 '12 at 15:49
am looking into converting problem to some std hypergraph product in the gringmann ref but cant figure it out right now. the dual idea seems very important but alone is not enough. it does lead to a problem with the right dimensions (in vertices) as you note, but maybe not edge dim ever (because of all the products given, they seem to have $|e| \geq |e_A| + |e_B|$....) maybe it is basically a new hypergraph product? or something else? it takes two hypergraphs $H_A, H_B$ with $r,s$ vertices and $|e_A|=|e_B|=n$ edges each, gives a new hypergraph with $r \cdot s$ vertices but still $n$ edges? — vzn, Oct 18 '12 at 15:57
new wild idea. could it have something to do with a bipartite or n-partite hypergraph? ... — vzn, Oct 18 '12 at 17:56
A small comment. Although for some reason Wikipedia has ended up with an article called "Tensor product of graphs", the actual term in common usage is "direct product" or "categorical product". — András Salamon, Oct 20 '12 at 17:12
@AndrásSalamon: that seems to vary by the sub-community. Chris Godsil for instance, and many researchers dealing with such topics in the penumbra of the quantum information or quantum chemistry community, refer to it as a tensor product of graphs. — Niel de Beaudrap, Oct 20 '12 at 18:04
@Niel de Beaudrap: that may be so, but the standard references on graph products (the books by Imrich/Klavžar) use direct product. Tensor product seems to be some kind of historical usage that has been adopted again by accident. — András Salamon, Oct 20 '12 at 21:59
@Andras: it is certainly good to note synonyms which are likely to be common, in any case; I'll edit to remark on it. — Niel de Beaudrap, Oct 20 '12 at 22:05

score 4 · Answer 2 · edited Apr 13 '17 at 12:32

4

This problem is very similar to dominating set in bipartite graphs. Indeed, whenever $a_i \lor b_j = 0$, we know that $a_i = b_j = 0$. Remove all these inputs and constraints. Now we are left with sets $A',B'$ and edges containing vertices in $A'$ to vertices in $B'$. Any solution to this system of equations corresponds to a dominating set in this bipartite graph.

Finding a solution of minimal Hamming weight is the same as solving minimum dominating set, which is known to be NP-hard even for planar bipartite graphs of maximum degree 3, as shown by Jukka Suomela in this very site.

edited Apr 13 '17 at 12:32

Community

1

answered Oct 15 '12 at 22:44

Yuval Filmus

14,445
1
48
92

thx but dont follow & think it might not fit; your comment above is not correct, didnt mean $|\cdot|$ as length of the bitvector but as size of the bitvector set (number of bitvectors in the set). $C$ is given but not $A,B$. if you think this is right can you sketch out how this applies for a given $C$? am not saying my problem statement is free of ambiguity, there might be a clearer way to state it.... – vzn Oct 15 '12 at 22:59

Factoring Cartesian bitwise join of bit vectors

2 Answers2