1. intro and course logistics
	

2. matrix multiplication a major open problem; describe recent developments


3. other problems suited to algebraic complexity framework:

	most of linear algebra, FFT, fast polynomial arithmetic, permanent, determinant...

	algorithms are naturally algebraic (not using bit operations), so study upper and lower bounds here  


4. the algebraic setting

	underlying field (usually C or R, but also finite field sometimes) 

	"problem" is a collection of polynomials over X_1, ..., X_n (indeterminates)

	"algorithm" is arithmetic circuit: fan-in 2 gates +, *, / and fan-in 1 gates X_i or scalar

	naturally represents a polynomial (or rational function); main complexity measure is size

	in different settings we may declare scalar mult free ("non-scalar total complexity"), 

		or addition free ("multiplicative complexity")



		e.g. matmult: f_ik(A, B) = \sum_j A_ijB_jk

		e.g. DFT: f_i(X) = \sum_j w^{ij}X_i   	[w n-th root of unity]

		e.g. univariate poly mult: f_k(A, B) = \sum{i + j = k} A_iB_j


	
	Note: small algebraic circuit implies (up to precision issues if char = 0) small Boolean circuit

	Thus, proving lower bounds in algebraic setting should be easier (and it is)



5. specializing the model to bilinear problems (like matmult)


	goal: show that WLOG we can assume circuits for MM have "natural form"

	
	Theorem [Strassen]: (+, *, /) circuit C of size s computing n-variate polys of degree d 

		implies circuit of size s*poly(d) using only (+, *) gates.


	[e.g. (how div can help):  X^31 natural circuit takes 7 mults, but only 6 (*, /) operations]


	Proof: each output gate computes polynomial f


		I. transform to use single div. gate: f = h / g at top

		duplicate each gate, and keep track of denominator and numerator at each gate

		identities: 	a/b + c/d = (ad + cb)/bd  		[+ gate]

				a/b * c/d = ac/bd			[* gate]

				a/b / c/d = ad/bc			[/ gate]

		
		size increase: mult s by a constant

		
		II. f = h / g. Field large -> exists a_1, ..., a_n such that g(a_1, ..., a_n) \ne 0.

		
		translate by -a and mult by scalar so that g(0,0,...0) = 1


		size increase: mult s by a constant 	



		1/(1 - X) = 1 + x + X^2 + X^3 + ...

		
		Proof: (1 - X)(1 + X + X^2 + ...) = 1. 


		Same holds when X replaced by g(X) with g_0 = 0 (so inf. sum well-defined)


		
		Thus: 1/g = 1/(1 - (1 - g)) = 1 + \sum_{i \ge 1} (1 - g)^i


		Thus in our circuit so far, can replace


		f = h/g		with


		f = \sum_{i \ge 0} h(1 - g)^i


		BUT f is a polynomial of degree d. So homogeneous part on both sides must match, 

			and only d are non-zero.

		ALSO: g_0 = 1, so (1 - g) has no constant term. Thus minimal degre term in (1 - g)^j is j.


		[so we only care about d different powers of (1-g), and only d of the homog. parts of h]		


		III. from circuits for h, g, obtain circuit computing p_j = h(1-g)^j for j = 0, ..., d


		size increase: add O(d) for each p_j, (d+1 of them)


		replace circuit computing p_j with circuit computing the first d homog parts:


			p_j^(0), ..., p_j^(d)

		
		size increase: mult by O(d^2)


		now form f by summing homog. parts 0 ... d

	
	END OF PROOF.

		
	Slightly more careful: extra multiplicative cost is (d choose 2)


	Moral: for low-degree problems (like matmult), don't need to bother with / gates (for upper or lower bnds)


***	Next: for quadratic problems, form of circuit can be assumed to be:

	
		[quadratic circuit]
		
		[might think that computing higher degrees and having them cancel could help...]

		[don't know that it can't help for degree > 2, but we know for degree 2]

	
	Let L(f_1,..., f_n) denote the multiplicative complexity of computing quadratic polys f_i 

	
	Theorem[Strassen]: L(f_1,...,f_n) = min{m : \exist linear forms g_1, ..., g_m, h_1,..., h_m

		such that f_1...f_n in span(g_1h_1, ..., g_mh_m)}.

	Proof:

		of course \le holds

		we will prove \ge holds

		start with a (*, +) circuit for f_1,,, f_n


		fix a pre-order traversal of multiplication gates. Let s_1, ... s_\ell be

			polys computed at each gate [numbered in this order]

			
		each s_i is product of a_i, b_i, each of which = constant + linear combo of X_i 

							+ linear combo of S_1, ..., s_{i-1}


		Claim: S_i^(2) is in the span of a_1^(1)b_1^(1) ... a_i^(1)b_i^(1)

	
		Proof by induction on i. 


		Note that s_i^(2) = a_i^(0)b_i^(2) + a_i^(1)b_i(1) + a_i^(2)b_i^(0)

		

		Base case: trivial since a_i^(2), b_i^(2) = 0  ["no quadratic part"]


		Consider s_i = a_ib_i. What contributes to quadratic part?


			constant * linear combo of s_j^(2) : j < i 

			+ linear combo of s_j^(2) : j < i  * constant

	
			+ a_i^(1)b_i^(1)


		by induction, first two are just linear combos of a_j^(1)b_j^(1).


		conclude s_i^(2) = linear combo of a_j^(1)b_j^(1) : j \le i		[as required]


		END OF PROOF (of claim)


		since f_1, ... f_n are quadratic, they must be in the span of the s_j^(2)...


		END of PROOF
					
		

***	Next: for bilinear problems, can assume bilinear circuit at cost of possibly extra factor of 2


		Bilinear function f(X, Y) linear in both arguments

			[i.e. f(\alpha X + \beta X', Y) = \alpha f(X, Y) + \beta f(X', Y)   and same for Y]


		Matrix multiplication is bilinear (so is polynomial multiplication, etc...)

		
		So, f_ik = \sum_j A_ijB_jk [mat mult] 

			
		is special kind of quadratic map on {A_ij, B_jk} where never have A*A or B*B

			[all individual degrees are 1]


		By above Theorem, can assume quadratic circuit. 


			I.e. linear combo of g_\ellh_\ell for linear forms g_ell, h_\ell


			write g_ell = g_\ell(A-part) + g_\ell(B-part)

			and h_\ell = h_\ell(A-part) + h_\ell(B-part)


			Then f_ik is in span of g_\ell(A-part) * h_\ell(B-part) 
		
						and h_\ell(A-part) * g_\ell(B-part) 

		
		So, *bilinear* circuit with multiplicative complexity at most 2 factor larger




	MORAL: even in the completely general model of computation, 

		we can assume that circuits for MM with *nearly* minimum multiplicative complexity 

		are BILINEAR! 


6. Bilinear circuits and tensor rank


	three ways of expressing a bilinear problem:


	1. collection of bilinear forms: f_1(X, Y), ..., f_n(X, Y)  	[f_i = g_i(X)h_i(Y), linear forms g,h]

	2. trilinear form: \sum_i f_i(X, Y)Z_i = \sum_i g_i(X)h_i(X)Z_i

	3. tensor T \in C^|X| \oplus C^|Y| \oplus C^n


		T[i,j,k] = coeff on X_iY_jZ_k in trilinear form


		
	measure complexity of #1 by multiplicative complexity of bilinear circuit


		= smallest r such that each f_i in span of a_j(X)b_j(Y), j = 1...r

	
	measure complexity of #3 by rank(T)	

		
		= smallest r such that T = sum_{j=1}^r (a_j \otimes b_j \otimes c_j)


	Proposition: two measures are equal


	Proof: 

		(rank \le mult. complexity) 

		define c_j(Z) = \sum \alpha_i Z_i    


			where \alpha_i is coeff of a_j(X)b_j(Y) in linear combo that equals f_i

	
		clearly \sum_j a_j(X)b_j(Y)c_j(Z) = target trilinear form


		and \sum_j a_j \otimes b_j \otimes c_j  has in position [s,t,u] coeff on X_sY_tZ_u


		(mult. complexity \le rank)

		
		from rank r tensor decomposition, we have trilinear form == \sum_{j=1}^r a_j(X)b_j(Y)c_j(Z)


		have circuit comput a_j(X)b_j(Y) for j = 1...r


		coeff on Z_i  (== f_i(X,Y) ) is a linear combo of these


	END OF PROOF


7. rank of matrix multiplication


	denote by <n,m,p> the tensor for n x m by m x p matrix multiplication


	= \sum_{i \in [n], j \in [m], k \in [p]} X_{ij}Y_{jk}Z_{ki}

	
	
	Fact (obvious from definition of rank): R(<n,m,p>) = R(<n',m',p'>) for any permutation n', m', p' of {n,m,p}


		[advantage of the symmetric tensorial notation]


	product of tensors:


	t \otimes t' 		[picture]


	as trilinear forms 	


		(\sum t[i,j,k]X_iY_jZ_k)(\sum t'[i,j,k]X_iY_jZ_k)	

		  
		 =  \sum t[i,j,k]t'[i',j',k'] X_{i,i'}Y_{j,j'}Z_{k,k'}		[rename X_iX_i' = X_{i,i'} etc]


			
	For mat mult: <n,m,p> \otimes <n',m',p'>


		(\sum_{i,j,k} X_{i,j}Y_{j,k}Z_{k,i})(\sum_{i,j,k} X_{i,j}Y_{j,k}Z_{k,i})		
		

		
		= \sum_{i,i', j,j', k,k'} X_{i,i'; j,j'}Y_{j,j'; k,k'}Z_{k,k'; i,i'}

	
		<nn',mm',pp'>


	Proposition: R(t \otimes t') \le R(t)R(t')


	Proof: manipulation of symbols


	
***	Definition: \omega = inf{\alpha : R(<n,n,n>) \le O(n^\alpha)}


	Proposition: R(<n,m,p>) \le r  implies omega \le 3 \log_{mnp} r


	Proof: R(<n,m,p> \otimes <m,p,n> \otimes <p,n,m>) = R(<N,N,N>) \le r^3		[N = nmp]


	R(<N^i,N^i,N^i>) \le r^{3i} = N^{i(3\log_N r)}.



	Theorem: total arithmetic complexity of MM is at most O(n^{\omega + \epsilon}), for all \epsilon > 0


	Proof: recursive algorithm, like Strassen.


			
8. R(<2,2,2>) \le 7



	picture of 4 target slices


	figure it out


	known upper and lower bounds for R(<n,m,p>) with n,m,p \in {2,3}: page 29 of Blaser



COMING UP: border rank, asymptotic sum inequality (half of the additivity conjecture of Strassen).