If there's enough memory, look up tables can be produced before the motion starts and are really fast.
A third or forth order polynomial (spline) could be fit for each axis and would probably work for each move segment. The math to do the fit is more intensive, but it scales nicely for many axes and is much faster than trig functions.
Also, the standard trig functions usually compute to the precision of the machine, which is way overkill for some applications. I implemented sin and cos functions using taylor series to a few terms and it was plenty fast for that project (
http://www.me.gatech.edu/me6405/Projects/Fall03/Group1/index.html). That was on an old HC11 with a 2 MHz clock, the arduino should have no problem with it.