Policy iteration, which evaluates and improves the control policy iteratively, is a reinforcement learning method. Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data