##### GP_NEB_AIE.py
##### Copyright: Olli-Pekka Koistinen, Aalto University, 9.7.2020
#####
##### This is the simpler all-images-evaluated (AIE) version of the
##### GP-NEB algorithm for finding a minimum energy path and a saddle
##### point between two minimum points.
##### The relaxation of the path on the approximated energy surface
##### is done according to the nudged elastic band (NEB) method with
##### a climbing image option for the highest-energy image.
##### After each relaxation phase, the energy and gradient are acquired
##### for all the intermediate images, and the GP hyperparameters are
##### reoptimized.
#####
##### Input:
#####   pot_general        accurate potential and gradient function
#####                        (takes 'N_im' images as ndarray of shape 'N_im' x 'D',
#####                         and returns the potential energy at those images as ndarray of shape 'N_im' x 1
#####                         and the gradient of the potential energy as ndarray of shape 'N_im' x 'D')
#####   R_init             coordinates for the images on the initial path (ndarray of shape 'N_im' x 'D')
#####   method_step        a function defining the following step during path relaxation (see, e.g., 'utils.step_QMVelocityVerlet')
#####   kernel_list        kernels for the GPy model
#####   likelihood_list    likelihoods for the GPy model
#####   opt                optimization settings for the GPy model hyperparameters
#####   param_step         parameters of the path relaxation method (shape depends on 'method_step') [default 0.01]
#####   k_par              parallel spring constant [default 1.0]
#####   T_MEP              final convergence threshold (the algorithm is stopped when the accurate norm of
#####                        NEB force is less than this for all images) [default 0.1]
#####   T_CI               additional final convergence threshold for the climbing image [default 0.1]
#####   T_CIon_gp          premilinary GP convergence threshold after which the climbing image
#####                        mode is turned on during relaxation phase (use 0 if CI not used at all) [default 0.0]
#####   divisor_T_MEP_gp   if this option is set on (> 0), the convergence threshold for a relaxation phase
#####                        is 1/'divisor_T_MEP_gp' of the smallest accurate norm of NEB force obtained so far
#####                        on any of the intermediate images, but not less than 1/10 of the lowest final threshold
#####                        (otherwise the GP convergence threshold is always 1/10 of the lowest final threshold) [default 10.0]
#####   disp_max           maximum displacement of image from the nearest observed data point
#####                        relative to the length of the initial path
#####                        (the relaxation phase is stopped if 'disp_max' is reached for any image) [default 0.5]
#####   num_bigiter_init   number of outer iterations started from the initial path 'R_init' [default 1]
#####                      - Until 'num_bigiter_init' is reached, each relaxation phase is started from the initial path 'R_init'
#####                          (if climbing image is used, the CI phase is continued from the "preliminarily converged" evenly spaced path)
#####                      - After that, each relaxation phase is started from the latest converged path
#####                          (if climbing image is used, each relaxation phase is started
#####                           from the latest "preliminarily converged" evenly spaced path,
#####                           and the CI phase is started from the latest converged CI-path if CI is unchanged
#####                           (otherwise continued from the current "preliminarily converged" evenly spaced path))
#####   num_bigiter        maximum number of outer iterations (new sets of observations) [default 300]
#####   num_iter           maximum number of inner iterations (steps during a relaxation phase) [default 10000]
#####   num_bigiter_hess   number of outer iterations using the "virtual Hessian" around the minimum points [default 0]
#####   eps_hess           epsilon for the "virtual Hessian" [default 0.001]
#####   visualize          1: visualizes the true energy along the path
#####                        (requires extra evaluations, so not to be used in real applications)
#####
##### Output:
#####   R                  coordinates for the images on the final path (ndarray of shape 'N_im' x 'D')
#####   E_R                energy at the images on the final path (ndarray of shape 'N_im' x 1)
#####   G_R                gradient at the images on the final path (ndarray of shape 'N_im' x 'D')
#####   i_CI               index of the climbing image among the intermediate images of the final path
#####   gp_model           the final GP model
#####   R_all              coordinates for all image observations (ndarray of shape 'N_obs' x 'D')
#####   E_all              energy for all image observations (ndarray of shape 'N_obs' x 1)
#####   G_all              gradient for all image observations (ndarray of shape 'N_obs' x 'D')
#####   obs_at             total numbers of inner iterations before new observations were taken
#####   E_R_acc            accurate energies of the images for each outer iteration
#####   E_R_gp             approximated energies of the images for each inner iteration
#####   normF_R_acc        accurate norm of the NEB force acting on each intermediate image for each outer iteration
#####   normF_R_gp         approximated norm of the NEB force acting on each intermediate image for each inner iteration
#####   normFCI_acc        accurate norm of the NEB force acting on the climbing image for each outer iteration (0 if CI is off)
#####   normFCI_gp         approximated norm of the NEB force acting on the climbing image for each inner iteration (0 if CI is off)
#####   param_gp           optimized GP hyperparameters for each outer iteration
#####   figs               figures

import numpy as np
import utils
import matplotlib.pyplot as plt
import GPy
import paramz
import pdb

def GP_NEB_AIE(pot_general, R_init, method_step, kernel_list, likelihood_list, opt, param_step=0.01, k_par=1.0, T_MEP=0.1, T_CI=0.1, T_CIon_gp=0.0, divisor_T_MEP_gp=10.0, disp_max=0.5, num_bigiter_init=1, num_bigiter=100, num_iter=10000, num_bigiter_hess=0, eps_hess=0.001, visualize=1):


    ###
    ### THIS INFORMATION IS ASSUMED TO BE KNOWN BEFORE BEGINNING
    ###
    
    # number of images on the path (scalar):
    N_im = R_init.shape[0]
    # dimension of the space (scalar):
    D = R_init.shape[1]
    # minimum point 1 (ndarray of size 1 x 'D'):
    min1 = R_init[:1,:]
    scale = 0.0
    # length of the initial path:
    for i in range(N_im-1):
        scale = scale + np.sqrt(np.sum(np.square(R_init[i+1,:]-R_init[i,:])))
    # energy and gradient at minimum point 1 (ndarrays of shape 1 x 1 and 1 x 'D'):
    E_min1, G_min1 = pot_general(min1)
    if E_min1.ndim < 2:
        print('ERROR: Modify your energy function so that it returns two-dimensional ndarrays (of shape ''N_im'' x 1 and ''N_im'' x ''D''), even if there is only one image in the input (''N_im'' = 1)!')
        return
    # minimum point 2 (ndarray of shape 1 x 'D'):
    min2 = R_init[-1:,:]
    # energy and gradient at minimum point 2 (ndarrays of shape 1 x 1 and 1 x 'D'):
    E_min2, G_min2 = pot_general(min2)
    # Elevel = np.min((E_min1,E_min2)) # zero level of energy is set to the lower minimum (scalar)
    Elevel = E_min1 # zero level of energy is set to minimum point 1 (scalar)
    E_min1 = E_min1 - Elevel
    E_min2 = E_min2 - Elevel
    # define the "virtual Hessian" points if used:
    if num_bigiter_hess > 0:
        R_h = utils.get_hessian_points(R_init,eps_hess) 
        E_h, G_h = pot_general(R_h)
        E_h = E_h - Elevel
    else:
        R_h = np.ndarray(shape=(0,D))
        E_h = np.ndarray(shape=(0,1))
        G_h = np.ndarray(shape=(0,D))
    # coordinates of all observation points:
    R_all = np.vstack((R_h,min1,min2))
    # energy for all observation points:
    E_all = np.vstack((E_h,E_min1,E_min2))
    # gradient for all observation points:
    G_all = np.vstack((G_h,G_min1,G_min2))
    

    ###
    ### THE ALGORITHM BEGINS HERE
    ###

    # coordinates of the images (ndarray of shape 'N_im' x 'D'):
    R = R_init.copy()
    # latest evenly spaced path (no climbing image) (ndarray of shape 'N_im' x 'D'):
    R_latest_equal = np.ndarray(shape=(0,D))
    # latest climbing image index among the intermediate images
    if T_CIon_gp > 0:
        i_CI_latest = 0 
    # energy at the images (ndarray of shape 'N_im' x 1):
    E_R = np.vstack((E_min1,np.zeros((N_im-2,1)),E_min2))
    # gradient at the images (ndarray of shape 'N_im' x 'D'):
    G_R = np.vstack((G_min1,np.zeros((N_im-2,D)),G_min2))

    # GPy model:
    gp_model = GPy.core.MultioutputGP(X_list=[R_all]*(D+1),Y_list=[E_all]+np.hsplit(G_all,D),kernel_list=kernel_list,likelihood_list=likelihood_list,inference_method=GPy.inference.latent_function_inference.exact_gaussian_inference.ExactGaussianInference())

    figs = []
    if visualize > 0:
        # prepare figure for visualization of the true energy along the spline interpolation of the path:
        from scipy.interpolate import CubicSpline
        fig1 = plt.figure(1)
        #plt.label('vispath')
        csr = np.arange(0,N_im*10)/(10*N_im-1)
        plt.title('True energy along cubic spline interpolation of the path')
        plt.xlabel('image number')
        figs.append(fig1)
    
    # in case of 2D space, define a range for visualization and plot the initial path:
    if D == 2:
        scale1 = np.abs(min2[0,0]-min1[0,0])/4
        scale2 = np.abs(min2[0,1]-min1[0,1])/4
        X1, X2 = np.meshgrid(np.arange(np.min((min1[0,0],min2[0,0]))-2*scale1,np.max((min1[0,0],min2[0,0]))+scale1,scale1/20),np.arange(np.min((min1[0,1],min2[0,1]))-scale2,np.max((min1[0,1],min2[0,1]))+scale2,scale2/20))
        gp_model.optimize(optimizer=opt)
        param_gp_init = gp_model.kern[:].copy()
        gridsize = np.shape(X1)[0]*np.shape(X1)[1]
        Ef, Varf = gp_model.predict_noiseless(Xnew=[np.hstack((X1.reshape(gridsize,1),X2.reshape(gridsize,1)))])
        fig = plt.figure()
        #plt.label('vis2D')
        plt.title('Approximated energy surface in the beginning, initial path')
        plt.contourf(X1,X2,Ef.reshape(np.shape(X1)[0],np.shape(X1)[1]),100)
        plt.plot(R[:,0],R[:,1],'yo',markerFaceColor='y')
        plt.plot(R_all[:,0],R_all[:,1],'r+')
        plt.axis('equal')
        plt.axis('tight')
        plt.jet()
        plt.colorbar()
        figs.append(fig)
    
    # ndarray gathering accurate energies of the images for each outer iteration:
    E_R_acc = np.ndarray(shape=(N_im,0))
    # ndarray gathering approximated energies of the images for each inner iteration:
    E_R_gp = np.ndarray(shape=(N_im,0))
    # ndarray gathering accurate norm of the NEB force acting on each intermediate image for each outer iteration:
    normF_R_acc = np.ndarray(shape=(N_im-2,0))
    # ndarray gathering accurate norm of the NEB force acting on the climbing image for each outer iteration (0 if CI is off):
    normFCI_acc = np.ndarray(shape=(0))
    # ndarray gathering approximated norm of the NEB force acting on each intermediate image for each inner iteration:
    normF_R_gp = np.ndarray(shape=(N_im-2,0))
    # ndarray gathering approximated norm of the NEB force acting on the climbing image for each inner iteration (0 if CI is off):
    normFCI_gp = np.ndarray(shape=(0))
    # ndarray gathering the numbers of total iterations before new observations were taken:
    obs_at = np.ndarray(shape=(0))  
    # optimized GP hyperparameters for each relaxation phase:
    param_gp = np.ndarray(shape=(0,gp_model[:].shape[0]))
    
    # OUTER ITERATION LOOP
    for ind_bigiter in range(num_bigiter+1):
        
        # acquire the accurate energy and gradient on the relaxed path and add them to the data:
        R_all = np.vstack((R_all,R[1:-1,:]))
        E_R = E_R.copy()
        G_R = G_R.copy()
        E_R[1:-1,:], G_R[1:-1,:] = pot_general(R[1:-1,:])
        E_R[1:-1,:] = E_R[1:-1,:] - Elevel
        F_R, normFCI, i_CI = utils.force_NEB(R,E_R,G_R,k_par,T_CIon_gp)
        E_all = np.vstack((E_all,E_R[1:-1,:]))
        G_all = np.vstack((G_all,G_R[1:-1,:]))
        E_R_acc = np.hstack((E_R_acc,E_R))
        normF_R_acc = np.hstack((normF_R_acc,np.sqrt(np.sum(np.square(F_R),1))[np.newaxis].T))
        normFCI_acc = np.hstack((normFCI_acc,normFCI))
        obs_at = np.hstack((obs_at,E_R_gp.shape[1]))
        if T_CIon_gp > 0:
            print('Accurate values: meanE_R = {:.3g}, maxnormF_R = {:.3g}, minnormF_R = {:.3g}, normFCI = {:.3g} (image {:g})\n\n'.format(np.mean(E_R_acc[:,-1]),np.max(normF_R_acc[:,-1]),np.min(normF_R_acc[:,-1]),normFCI_acc[-1],i_CI+1))
        else:
            print('Accurate values: meanE_R = {:.3g}, maxnormF_R = {:.3g}, minnormF_R = {:.3g}\n\n'.format(np.mean(E_R_acc[:,-1]),np.max(normF_R_acc[:,-1]),np.min(normF_R_acc[:,-1])))

        # stop the algorithm if final convergence is obtained:
        if np.max(normF_R_acc[:,-1]) < T_MEP and normFCI_acc[-1] < T_CI:
            print('Final convergence obtained after {:g} relaxation phases ({:g} image evaluations).\n'.format(ind_bigiter,(N_im-2)*(ind_bigiter+1)))
            break
        
        # stop the algorithm if maximum number of outer iterations is reached:
        if ind_bigiter == num_bigiter:
            print('Stopped the algorithm: Maximum number of outer iterations ({:g}) reached.\n'.format(ind_bigiter))
            break
        
        # remove the "virtual Hessian" observations if needed:
        if num_bigiter_hess > 0 and ind_bigiter == num_bigiter_hess:
            R_all = np.delete(R_all,range(2*D),0)
            E_all = np.delete(E_all,range(2*D),0)
            G_all = np.delete(G_all,range(2*D),0)
        
        if visualize > 0:
            # visualize the true energy along the spline interpolation of the relaxed path:
            plt.figure(1)
            cs = CubicSpline(np.arange(0,N_im)/(N_im-1),R)
            R_spline = cs(csr)
            E_spline, G_spline = pot_general(R_spline)
            E_spline = E_spline - Elevel
            plt.plot(csr*(N_im-1)+1,E_spline,'r')
            plt.plot(np.arange(1,N_im+1),E_R,'o',MarkerEdgeColor='r',MarkerFaceColor='r')
        
        # update the GP model and optimize the hyperparameters:
        gp_model.set_XY([R_all]*(D+1),[E_all]+np.hsplit(G_all,D))
        gp_model.optimize(optimizer=opt)
        param_gp = np.vstack((param_gp,gp_model[:]))
        
        # define the convergence threshold for the relaxation phase:
        if divisor_T_MEP_gp > 0:
            # if this option is set on, the GP convergence threshold is 1/'divisor_T_MEP_gp'
            # of the smallest accurate norm of NEB force obtained so far on any of the intermediate images,
            # but not less than 1/10 of the lowest final threshold
            T_MEP_gp = max((np.min(normF_R_acc)/divisor_T_MEP_gp,np.min((T_MEP/10,T_CI/10))))
        else:
            # otherwise the GP convergence threshold is always 1/10 of the lowest final threshold
            T_MEP_gp = min((T_MEP,T_CI))/10
        
        # define the start path for the relaxation phase:
        if ind_bigiter+1 > num_bigiter_init and R_latest_equal.shape[0] > 0:
            R = R_latest_equal.copy()
            if T_CIon_gp > 0:
                print('Started relaxation phase {:g} from the latest "preliminarily converged" evenly spaced path (no climbing image).\n'.format(ind_bigiter+1))
            else:
                print('Started relaxation phase {:g} from the latest converged path.\n'.format(ind_bigiter+1))
        else:
            R = R_init.copy()
            print('Started relaxation phase {:g} from the initial path.\n'.format(ind_bigiter+1))
        
        iters = 0
        # set climbing image mode off in the beginning:
        CI_on = 0
        # indicator of early stopping:
        not_relaxed = 0
        # velocities of the intermediate images (given as an output of the previous step):
        V_old = np.zeros((N_im-2,D))
        # NEB forces on the intermediate images on the previous path:
        F_R_old = np.zeros((N_im-2,1))
        # indicator if zero velocity used (for the first iteration):
        zeroV = 1
        
        # INNER ITERATION LOOP
        for ind_iter in range(num_iter+1):
            
            # calculate approximated energy and gradient on the new path:
            EG_R = gp_model.predict_noiseless([R]*(D+1))[0]
            E_R = EG_R[:N_im,:].copy()
            G_R = np.reshape(EG_R[N_im:,:],(D,N_im)).T.copy()
            F_R, normFCI, i_CI = utils.force_NEB(R,E_R,G_R,k_par,CI_on)
            normF_R = np.sqrt(np.sum(np.square(F_R),1)[np.newaxis].T)          

            # turn climbing image option on and correct the NEB force accordingly if sufficiently relaxed:
            if CI_on <= 0 and np.max(normF_R) < T_CIon_gp:
                R_latest_equal = R.copy()
                CI_on = 1
                i_CI_test = np.argmax(E_R[1:-1,:])+1
                print('Climbing image (image {:g}) turned on after {:g} inner iterations.\n'.format(i_CI_test+1,iters))
                if ind_bigiter+1 > num_bigiter_init and i_CI_test == i_CI_latest:
                    EG_R_test2 = gp_model.predict_noiseless(Xnew=[R_latest_climb]*(D+1))[0]
                    E_R_test2 = EG_R_test2[:N_im,:].copy()
                    i_CI_test2 = np.argmax(E_R_test2[1:-1,:])+1
                    if i_CI_test2 == i_CI_latest:
                        R = R_latest_climb.copy()
                        E_R = E_R_test2.copy()
                        G_R = np.reshape(EG_R_test2[N_im:,:],(D,N_im)).T.copy()
                        print('CI unchanged: continued from the latest converged CI-path.\n')
                F_R, normFCI, i_CI = utils.force_NEB(R,E_R,G_R,k_par,CI_on)
                normF_R = np.sqrt(np.sum(np.square(F_R),1)[np.newaxis].T)
                zeroV = 1
             
            E_R_gp = np.hstack((E_R_gp,E_R))
            normF_R_gp = np.hstack((normF_R_gp,normF_R))
            normFCI_gp = np.hstack((normFCI_gp,normFCI))
            
            # stop the relaxation phase if converged:
            if ( T_CIon_gp <= 0 or CI_on > 0 ) and np.max(normF_R) < T_MEP_gp and iters > 0:
                if CI_on > 0:
                    R_latest_climb = R.copy()
                    i_CI_latest = i_CI
                    print('Stopped the relaxation phase: converged after {:g} inner iterations (CI: image {:g}).\n'.format(iters,i_CI+1))
                else:
                    R_latest_equal = R.copy()
                    print('Stopped the relaxation phase: converged after {:g} inner iterations.\n'.format(iters))
                break

            # stop the relaxation phase if maximum number of inner iterations reached:
            if iters == num_iter:
                print('Stopped the relaxation phase: maximum number of inner iterations ({:g}) reached.\n'.format(iters))
                not_relaxed = 1
                break
               
            # move the path one step along the NEB force according to the chosen method:
            R_new, V_old = method_step(R,F_R,param_step,F_R_old,V_old,zeroV)
            zeroV = 0
            
            # reject the step and stop the relaxation phase if the distance from any current image to the
            # nearest observed data point is larger than 'disp_max' times the length of the initial path:
            if iters > 0:
                disp_nearest = np.zeros((N_im-2,1))
                for i in range(1,N_im-1):
                    disp_nearest[i-1,0] = np.sqrt(np.min(np.sum(np.square(R_new[i,:]-R_all),1)))
                maxdisp_nearest = np.max(disp_nearest)
                if maxdisp_nearest > disp_max*scale:
                    print('Stopped the relaxation phase after {:g} inner iterations: image {:g} too far from the nearest observed data point.\n'.format(iters,np.argmax(disp_nearest)+2))
                    not_relaxed = 1
                    break
                            
            # otherwise accept the step and continue the relaxation:
            iters = iters + 1
            R = R_new.copy()
            F_R_old = F_R.copy()
            
        # END OF INNER ITERATION LOOP
        
        # in case of 2D space, plot the relaxed path:
        if D == 2:
            Ef = gp_model.predict_noiseless(Xnew=[np.hstack((X1.reshape(gridsize,1),X2.reshape(gridsize,1)))])[0]
            fig = plt.figure()
            plt.contourf(X1,X2,Ef.reshape(np.shape(X1)[0],np.shape(X1)[1]),100)
            plt.plot(R[:,0],R[:,1],'yo',markerFaceColor='y')
            plt.plot(R_all[:,0],R_all[:,1],'r+')
            plt.axis('equal')
            plt.axis('tight')
            plt.jet()
            plt.colorbar()
            if not_relaxed > 0:
                plt.title('Approximated energy surface on round {:g}, path relaxation stopped early'.format(ind_bigiter+1))
            else:
                plt.title('Approximated energy surface on round {:g}, relaxed path'.format(ind_bigiter+1))
            figs.append(fig)
            
    # END OF OUTER ITERATION LOOP
    
    if visualize > 0:
        # visualize the true energy along the spline interpolation of the final path:
        plt.figure(1)
        cs = CubicSpline(np.arange(0,N_im)/(N_im-1),R)
        R_spline = cs(csr)
        E_spline, G_spline = pot_general(R_spline)
        E_spline = E_spline - Elevel
        plt.plot(csr*(N_im-1)+1,E_spline,'b',LineWidth=2)
        plt.plot(np.arange(1,N_im+1),E_R,'o',MarkerEdgeColor='b',MarkerFaceColor='b')
        
    return R, E_R, G_R, i_CI, gp_model, R_all, E_all, G_all, obs_at, E_R_acc, E_R_gp, normF_R_acc, normF_R_gp, normFCI_acc, normFCI_gp, param_gp, figs

