In this paper, we propose a circuit design scheme of floating-point operation unit based on bit-by-bit looping and square-rooting algorithm, which reduces the gate-level number of cyclic iteration time limit to 14 . According to the 14-gate delay for the cycle time calculation, the completion of an IEEE single, double precision floating-point operations require 15 and 29 cycles respectively. At the same time, the article describes two main types of algorithms used in the present square-root operation: the bit-by-bit iterative square-rooting algorithm and the Newton-Raphson iterative method, including the redundant representation of numbers and so on.