loopify sigmoids and exps
1% faster for nns=4, due to reduced icache pressure
