The Montgomery ladder and Joye ladder are well-known algorithms for elliptic curve scalar multiplication with a regular structure. The Montgomery ladder is best known for its implementation on Montgomery curves, which requires 5**M**+4**S**+1**m**+8**A** per scalar bit, and 6 field registers. Here (**M**,**S**,**m**,**A**) represent respectively field **M**ultiplications, **S**quarings, **m**ultiplications by a curve constant, and **A**dditions or subtractions. This ladder is also *complete*, meaning that it works on all input points and all scalars.

Many protocols do not use Montgomery curves, but instead use prime-order curves in short Weierstrass form. These have historically been much slower, with ladders costing at least 14 multiplications or squarings per bit: 8**M**+6**S**+27**A**
for the Montgomery ladder and 8**M**+6**S**+30**A** for the Joye ladder. In 2017, Kim et al. improved the Montgomery ladder to 8**M**+4**S**+12**A**+1**H** per bit using 9 registers, where the H represents a halving. Hamburg simplified Kim et al.'s formulas to 8**M**+4**S**+8**A**+1**H** per bit using 6 registers.
Here we present improved formulas which compute the Montgomery ladder on short Weierstrass curves using 8**M**+3**S**+7**A**
per bit, and requiring 6 registers. We also give formulas for the Joye ladder that use 9**M**+3**S**+7**A** per bit, requiring 5 registers. One of our new formulas supports very efficient 4-way vectorization.

We also discuss curve invariants, exceptional points, side-channel protection and how to set up and finish these ladder operations. Finally, we show a novel technique to make these ladders complete when the curve order is not divisible by 2 or 3, at a modest increase in cost.

A sample implementation of these techniques is given in the supplementary material, also posted at https://github.com/bitwiseshiftleft/ladder_formulas.