Bernstein et al. have proposed a new permutation, Gimli, which aims to provide simple and performant implementations on a wide variety of platforms. One of the tricks used to make Gimli performant is that it processes data mostly in 96-bit columns, only occasionally swapping 32-bit words between them.
Here we show that this trick is dangerous by presenting a distinguisher for reduced-round Gimli. Our distinguisher takes the form of an attack on a simple and practical PRF that should be nearly 192-bit secure. Gimli has 24 rounds. Against 15½ of those rounds, our distinguisher uses two known plaintexts, takes about 264 time and uses enough memory for a set with 264 elements. Against 19½ rounds, the same attack uses three non-adaptively chosen plaintexts, and uses twice as much memory and about 2128 time. Against 22½ rounds, it requires about 2138.5 work, 2129 bits of memory and 210.5 non-adaptively chosen plaintexts. The same attack would apply to 23½ rounds if Gimli had more rounds.
Our attack does not use the structure of the SP-box at all, other than that it is invertible, so there may be room for improvement. On the bright side, our toy PRF puts keys and data in different positions than a typical sponge mode would do, so the attack might not work against sponge constructions.