##### atomic_GP_NEB_OIE2.py
##### Copyright: Olli-Pekka Koistinen, Aalto University, 9.7.2020
#####
##### This is the one-image-evaluated (OIE) version of the
##### atomic GP-NEB algorithm for finding a minimum energy path and a saddle
##### point between two minimum energy configurations.
##### The relaxation of the path on the estimated energy surface
##### is done according to the nudged elastic band (NEB) method with
##### a climbing image option for the highest-energy image.
##### After each relaxation phase, the energy and gradient are acquired
##### only for the image with the highest uncertainty, and the
##### GP hyperparameters are reoptimized.
##### If the (overoptimistic) GP estimate for the maximum component of the
##### gradient perpendicular to the path remains below the convergence
##### threshold after the new evaluation, more images are evaluated without moving the path,
##### until the estimate rises over the threshold or all images become evaluated.
#####
##### The atomic version of GP-NEB uses a special GPy covariance function 'RBF_atomic' implemented in 'rbf_atomic.py',
##### where the distance between configurations C and C' is based on the changes of the inter-atomic distances.
##### To use that kernel, GPy should be installed from:
##### https://github.com/esiivola/GPy/tree/feature-multioutput-grad-obs
##### The file 'rbf_atomic.py' should be then added to the folder 'GPy/GPy/kern/src/'
##### and the following line added to 'GPy/GPy/kern/__init__.py':
##### from .src.rbf_atomic import RBF_atomic
##### In addition, the files 'add.py' and 'static.py' should be updated in 'GPy/GPy/kern/src/'.
#####
##### Input:
#####   pot_general            accurate potential and gradient function
#####                            (takes 'N_im' images as ndarray of shape 'N_im' x 'D',
#####                             and returns the potential energy at those images as ndarray of shape 'N_im' x 1
#####                             and the gradient of the potential energy as ndarray of shape 'N_im' x 'D')
#####   conf_info              dictionary including information about the configurations necessary for the GP model
#####                           - conf_info["conf_fro"]: coordinates of active frozen atoms (ndarray of shape 'N_fro' x 3)
#####                           - conf_info["atomtype_mov"]: atomtype indices for moving atoms (ndarray of shape 'N_mov')
#####                           - conf_info["atomtype_fro"]: pairtype indices for active frozen atoms (ndarray of shape 'N_fro')
#####                           - conf_info["pairtype"]: pairtype indices for pairs of atomtypes (ndarray of shape 'n_at' x 'n_at')
#####                           - conf_info["n_pt"]: number of active pairtypes
#####   conf_info_inactive     dictionary including information about inactive frozen atoms
#####                           - conf_info_inactive["conf_ifro"]: coordinates of inactive frozen atoms (ndarray of shape 'N_ifro' x 3)
#####                           - conf_info_inactive["atomtype_ifro"]: atomtype indices for inactive frozen atoms (ndarray of shape 'N_ifro')
#####   actdist_fro            activation distance for moving+frozen atom pairs (inf if all active)
#####   R_init                 coordinates for the images on the initial path (ndarray of shape 'N_im' x 'D')
#####   method_step            a function defining the following step during path relaxation (see, e.g., 'utils.step_QMVelocityVerlet')
#####   param_step             parameters of the path relaxation method (shape depends on 'method_step') [default 0.01]
#####   method_force           a function defining the NEB force [default utils.force_NEB2]
#####   param_force            parameters of the NEB force method (shape depends on 'method_force') [default 1.0]
#####   T_MEP                  final convergence threshold for the accurate 'maxmaxG_R_perp', which is the maximum component
#####                            of gradient perpendicular to the path tangent at any of the intermediate images
#####   T_CI                   additional final convergence threshold for the climbing image [default 0.1]
#####   T_CIon_gp              premilinary GP convergence threshold after which the climbing image
#####                            mode is turned on during relaxation phase (use 0 if CI not used at all) [default 0.0]
#####   divisor_T_MEP_gp       if this option is set on (> 0), the convergence threshold for a relaxation phase
#####                            is 1/'divisor_T_MEP_gp' of the smallest accurate 'maxG_R_perp' of NEB force obtained so far
#####                            on any of the intermediate images, but not less than 1/10 of the lowest final threshold
#####                            (otherwise the GP convergence threshold is always 1/10 of the lowest final threshold) [default 10.0]
#####   disp_max               maximum displacement of image from the nearest observed data point
#####                            relative to the length of the initial path
#####                            (the relaxation phase is stopped if 'disp_max' is reached for any image) [default 0.5]
#####   ratio_at_limit         limit for the ratio (< 1) of inter-atomic distances between image and its "nearest" observed data point
#####                            (the relaxation phase is stopped if 'ratio_at_limit' is reached for any image) [default 2.0/3.0]
#####   num_bigiter_initpath   number of outer iterations started from the initial path 'R_init' [default 1]
#####                          - Until 'num_bigiter_initpath' is reached, each relaxation phase is started from the initial path 'R_init'
#####                              (if climbing image is used, the CI phase is continued from the "preliminarily converged" evenly spaced path)
#####                          - After that, each relaxation phase is started from the latest converged path
#####                              (if climbing image is used, each relaxation phase is started
#####                               from the latest "preliminarily converged" evenly spaced path,
#####                               and the CI phase is started from the latest converged CI-path if CI is unchanged
#####                               (otherwise continued from the current "preliminarily converged" evenly spaced path))
#####   num_bigiter_initparam  number of outer iterations where the hyperparameter optimization is started
#####                            from values initialized based on the range of the current data [default np.inf]
#####                            (after that, the optimization is started from the values of the previous round)
#####   num_bigiter            maximum number of outer iterations (new sets of observations) [default 300]
#####   num_iter               maximum number of inner iterations (steps during a relaxation phase) [default 10000]
#####   islarge_num_iter       indicator if 'num_iter' is assumed to be much larger than required for NEB convergence on accurate energy surface [default 1]
#####                            (if not, the next relaxation phase is continued from the current path if 'num_iter' is reached)
#####   num_bigiter_hess       number of outer iterations using the "virtual Hessian" around the minimum points [default 0]
#####   eps_hess               epsilon for the "virtual Hessian" [default 0.001]
#####   load_file              path to the data file required to continue from a cancelled run ('' if started normally from the beginning) [default '']
#####   save_file              path to the data file where data is saved ('' if not saved) [default '']
#####   quatern                indicator if quaternion trick used to remove rotation/translation of system [default 0]
#####   visualize              1: visualizes the true energy along the path [default 0]
#####                            (requires extra evaluations, so not to be used in real applications)
#####
##### Output:
#####   R                      coordinates for the images on the final path (ndarray of shape 'N_im' x 'D')
#####   E_R                    energy at the images on the final path (ndarray of shape 'N_im' x 1)
#####   G_R                    gradient at the images on the final path (ndarray of shape 'N_im' x 'D')
#####   i_CI                   index of the climbing image among the intermediate images of the final path
#####   gp_model               the final GP model
#####   R_all                  coordinates for all image observations (ndarray of shape 'N_obs' x 'D')
#####   E_all                  energy for all image observations (ndarray of shape 'N_obs' x 1)
#####   G_all                  gradient for all image observations (ndarray of shape 'N_obs' x 'D')
#####   Elevel                 level of zero energy in terms of the input potential
#####   obs_at                 total numbers of inner iterations before new observations were taken
#####   E_R_ae                 accurate/estimated energies of the images after each evaluation (thus includes one or more accurate values)
#####   E_R_gp                 estimated energies of the images for each inner iteration
#####   maxG_R_perp_ae         accurate/estimated maximum component of gradient perpendicular to the path tangent at each image after each evaluation
#####   maxF_R_gp              estimated maximum component of NEB force acting on each intermediate image for each inner iteration
#####   maxG_CI_ae             accurate/estimated maximum component of gradient at the climbing image after each evaluation (0 if CI is off)
#####   maxG_CI_gp             estimated maximum component of gradient at the climbing image for each inner iteration (0 if CI is off)
#####   param_gp               optimized GP hyperparameters after each evaluation
#####   figs                   figures

import numpy as np
from scipy.stats import norm
import utils
import utils_atomic
import matplotlib.pyplot as plt
import GPy
import paramz
import pdb

def atomic_GP_NEB_OIE2(pot_general, conf_info, conf_info_inactive, actdist_fro, R_init, method_step, param_step=0.01, method_force=utils.force_NEB2, param_force=1.0, T_MEP=0.1, T_CI=0.1, T_CIon_gp=0.0, divisor_T_MEP_gp=10.0, disp_max=0.5, ratio_at_limit=2.0/3.0, num_bigiter_initpath=1, num_bigiter_initparam=np.inf, num_bigiter=100, num_iter=10000, islarge_num_iter=1, num_bigiter_hess=0, eps_hess=0.001, load_file='', save_file='', quatern=0, visualize=0):

    if not load_file:

        ###
        ### THIS INFORMATION IS ASSUMED TO BE KNOWN BEFORE BEGINNING
        ###
    
        # number of images on the path (scalar):
        N_im = R_init.shape[0]
        # dimension of the space (scalar):
        D = R_init.shape[1]
        # minimum point 1 (ndarray of size 1 x 'D'):
        min1 = R_init[:1,:]
        scale = 0.0
        # length of the initial path:
        for i in range(N_im-1):
            scale = scale + np.sqrt(np.sum(np.square(R_init[i+1,:]-R_init[i,:])))
        # energy and gradient at minimum point 1 (ndarrays of shape 1 x 1 and 1 x 'D'):
        E_min1, G_min1 = pot_general(min1)
        if E_min1.ndim < 2:
            print('ERROR: Modify your energy function so that it returns two-dimensional ndarrays (of shape ''N_im'' x 1 and ''N_im'' x ''D''), even if there is only one image in the input (''N_im'' = 1)!')
            return
        # minimum point 2 (ndarray of shape 1 x 'D'):
        min2 = R_init[-1:,:]
        # energy and gradient at minimum point 2 (ndarrays of shape 1 x 1 and 1 x 'D'):
        E_min2, G_min2 = pot_general(min2)
        # Elevel = np.min((E_min1,E_min2)) # zero level of energy is set to the lower minimum (scalar)
        Elevel = E_min1 # zero level of energy is set to minimum point 1 (scalar)
        E_min1 = E_min1 - Elevel
        E_min2 = E_min2 - Elevel
        # define the "virtual Hessian" points if used:
        if num_bigiter_hess > 0:
            R_h = utils.get_hessian_points(R_init,eps_hess) 
            E_h, G_h = pot_general(R_h)
            E_h = E_h - Elevel
        else:
            R_h = np.ndarray(shape=(0,D))
            E_h = np.ndarray(shape=(0,1))
            G_h = np.ndarray(shape=(0,D))
        # coordinates of all observation points:
        R_all = np.vstack((R_h,min1,min2))
        # energy for all observation points:
        E_all = np.vstack((E_h,E_min1,E_min2))
        # gradient for all observation points:
        G_all = np.vstack((G_h,G_min1,G_min2))
    
        # initialize the GP model:
        ker_const = GPy.kern.Bias(input_dim=D)
        ker_const.variance.constrain_fixed()
        utils_atomic.update_active_fro(conf_info,conf_info_inactive,R_all,actdist_fro)
        print('{:g} active and {:g} inactive frozen atoms in the beginning.\n'.format(conf_info['conf_fro'].shape[0],conf_info_inactive['conf_ifro'].shape[0]))
        ker_sexpat = GPy.kern.RBF_atomic(input_dim=D, magnitude=1., lengthscale=np.ones(conf_info['n_pt']), conf_info=conf_info)
        ker = ker_const + ker_sexpat
        kernel_list = [ker]
        for dim in range(0,D):
            kernel_list += [GPy.kern.DiffKern(ker,dim)]
        lik = GPy.likelihoods.Gaussian()
        lik.variance.constrain_fixed(value=1e-8)
        likelihood_list = [lik]*(D+1)
        opt = paramz.optimization.optimization.opt_SCG(max_iters=1000, xtol=1e-4, ftol=1e-4, gtol=1e-4)
        gp_model = GPy.models.MultioutputGP(X_list=[R_all]*(D+1),Y_list=[E_all]+np.hsplit(G_all,D),kernel_list=kernel_list,likelihood_list=likelihood_list,inference_method=GPy.inference.latent_function_inference.exact_gaussian_inference.ExactGaussianInference())
       
        ###
        ### THE ALGORITHM BEGINS HERE
        ###

        # coordinates of the images (ndarray of shape 'N_im' x 'D'):
        R = R_init.copy()
        # latest evenly spaced path (no climbing image) (ndarray of shape 'N_im' x 'D'):
        R_latest_equal = np.ndarray(shape=(0,D))
        # previous evenly spaced path (no climbing image) if maximum number of inner iterations reached
        R_previous_equal = np.ndarray(shape=(0,D))     
        if T_CIon_gp > 0:
            # latest converged CI-path (ndarray of shape 'N_im' x 'D'):
            R_latest_climb = np.ndarray(shape=(0,D))
            # climbing image index among the intermediate images for the latest CI-path
            i_CI_latest = 0
            # previous CI-path if maximum number of inner iterations reached
            R_previous_climb = np.ndarray(shape=(0,D))
            # climbing image index among the intermediate images for the previous CI-path         
            i_CI_previous = 0
        eval_next_i = 0
        eval_next_CI = 0
        i_CI = 0
        # indicator of unevaluated images on the current path (ndarray of shape 'N_im' x 1):
        uneval = np.vstack((0,np.ones((N_im-2,1),dtype=np.int),0))
        # number of unevaluated images on the current path (scalar):
        N_uneval = np.sum(uneval)
        # energy at the images (ndarray of shape 'N_im' x 1):
        E_R = np.vstack((E_min1,np.zeros((N_im-2,1)),E_min2))
        # gradient at the images (ndarray of shape 'N_im' x 'D'):
        G_R = np.vstack((G_min1,np.zeros((N_im-2,D)),G_min2))
    
        # optimize the GP hyperparameters:
        if actdist_fro < np.inf:
            new_act = utils_atomic.update_active_fro(conf_info,conf_info_inactive,R[1:(N_im-1),:],actdist_fro)
            if new_act > 0:
                print('More frozen atoms activated. Now {:g} active and {:g} inactive frozen atoms.\n'.format(conf_info['conf_fro'].shape[0],conf_info_inactive['conf_ifro'].shape[0]))
                gp_model.kern.sum.RBF_atomic.conf_info = conf_info
        if num_bigiter_hess > 0:
            mean_y = np.mean(E_all[(2*D):,:])
            range_y = np.max(E_all[(2*D):,:])-np.min(E_all[(2*D):,:])
            range_x = np.max(utils_atomic.dist_at(R_all[(2*D):,:],R_all[(2*D):,:],conf_info,np.ones(conf_info['n_pt'])))
        else:
            mean_y = np.mean(E_all)
            range_y = np.max(E_all)-np.min(E_all)
            range_x = np.max(utils_atomic.dist_at(R_all,R_all,conf_info,np.ones(conf_info['n_pt'])))
        gp_model.kern.sum.bias.variance = mean_y**2
        gp_model.kern.sum.RBF_atomic.magnitude = norm.ppf(0.75,0,range_y/3)
        gp_model.kern.sum.RBF_atomic.lengthscale = norm.ppf(0.75,0,range_x/3)*np.ones(conf_info['n_pt'])
        mag_prior = GPy.priors.Gaussian(mu=0.0,sigma=range_y/3)
        mag_prior.domain = '_POSITIVE'
        gp_model.kern.sum.RBF_atomic.magnitude.set_prior(mag_prior)
        len_prior = GPy.priors.Gaussian(mu=0.0,sigma=range_x/3)
        len_prior.domain = '_POSITIVE'
        gp_model.kern.sum.RBF_atomic.lengthscale.set_prior(len_prior)
        gp_model.set_XY([R_all]*(D+1),[E_all]+np.hsplit(G_all,D))
        gp_model.optimize(optimizer=opt)
        param_gp_init = gp_model[:]	
	
        # ndarray gathering accurate/estimated energies of the images after each evaluation (thus includes one or more accurate values):
        E_R_ae = np.ndarray(shape=(N_im,0))
        # ndarray gathering estimated energies of the images for each inner iteration:
        E_R_gp = np.ndarray(shape=(N_im,0))
        # ndarray gathering accurate/estimated maximum component of gradient perpendicular to the path tangent at each intermediate image after each evaluation (thus includes one or more accurate values):
        maxG_R_perp_ae = np.ndarray(shape=(N_im-2,0))
        # ndarray gathering estimated maximum component of the NEB force acting on each intermediate image for each inner iteration:
        maxF_R_gp = np.ndarray(shape=(N_im-2,0))
        # ndarray gathering accurate/estimated maximum component of the NEB force acting on the climbing image after each evaluation (0 if CI is off):
        maxG_CI_ae = np.ndarray(shape=(0))
        # ndarray gathering estimated maximum component of gradient at the climbing image for each inner iteration (0 if CI is off):
        maxG_CI_gp = np.ndarray(shape=(0))
        # ndarray gathering the total numbers of inner iterations before new observations were taken:
        obs_at = np.ndarray(shape=(0))
        # smallest accurate maximum component of perpendicular gradient obtained so far on any of the intermediate images (scalar):
        smallest_acc_maxG = np.inf
        # optimized GP hyperparameters after each evaluation:
        param_gp = np.ndarray(shape=(0,gp_model[:].shape[0]))
        bigiter_init = 0
	
        figs = []
        if visualize > 0:
            # prepare figure for visualization of the true energy along the spline interpolation of the path:
            from scipy.interpolate import CubicSpline
            fig1 = plt.figure(1)
            #plt.label('vispath')
            csr = np.arange(0,N_im*10)/(10*N_im-1)
            plt.title('True energy along cubic spline interpolation of the path')
            plt.xlabel('image number')
            figs.append(fig1)
    
    else:
        ##### NOTICE: IMPLEMENT HERE LOADING OF DATA FROM 'load_file' !!!        
        print('ERROR: LOADING DATA FROM FILE NOT IMPLEMENTED!')
        return
        # load(load_file)
        # bigiter_init = bigiter + 1   

    # OUTER ITERATION LOOP
    for ind_bigiter in range(bigiter_init,num_bigiter+1):
        
        if eval_next_i > 0:
            # acquire the accurate energy and gradient at the image that was too far from nearest observed data point:
            # index of the image to be evaluated among the intermediate images:
            i_eval = eval_next_i
            print('Evaluate image that caused early stopping (image {:g})...\n'.format(i_eval+1))
        elif eval_next_CI > 0 and i_CI > 0:
            # acquire the accurate energy and gradient at the climbing image:
            i_eval = i_CI
            print('Evaluate the climbing image (image {:g})...\n'.format(i_eval+1))
        else:
            # acquire the accurate energy and gradient at the image with the highest uncertainty:
            VarE_R = gp_model.predict_noiseless([R])[1]
            i_eval = np.argmax(uneval*VarE_R)
            print('Evaluate the image with the highest uncertainty (image {:g})...\n'.format(i_eval+1))
        uneval[i_eval,0] = 0
        N_uneval = np.sum(uneval)
        # coordinates of the image to be evaluated (ndarray of shape 1 x 'D'):
        R_eval = R[i_eval:i_eval+1,:].copy()
        R_all = np.vstack((R_all,R_eval))
        E_R_eval, G_R_eval = pot_general(R_eval)
        E_R_eval = E_R_eval - Elevel
        E_R[i_eval,0] = E_R_eval
        G_R[i_eval,:] = G_R_eval
        E_all = np.vstack((E_all,E_R_eval))
        G_all = np.vstack((G_all,G_R_eval))
        
        # if all images evaluated, check final convergence:
        if N_uneval == 0:
            F_R, maxG_R_perp, maxG_CI, i_CI = method_force(R,E_R,G_R,param_force,T_CIon_gp)  
            if np.max(maxG_R_perp) < T_MEP and maxG_CI < T_CI:
                E_R_ae = np.hstack((E_R_ae,E_R))
                maxG_R_perp_ae = np.hstack((maxG_R_perp_ae,maxG_R_perp))
                maxG_CI_ae = np.hstack((maxG_CI_ae,maxG_CI))
                obs_at = np.hstack((obs_at,E_R_gp.shape[1]))
                print('Accurate values after {:g} image evaluations (including all {:g} intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g}, maxG_CI = {:.3g} (image {:g}) \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,np.max(E_R),np.max(maxG_R_perp),np.min(maxG_R_perp),maxG_CI,i_CI+1))
                print('Final convergence obtained after {:g} image evaluations.\n'.format(ind_bigiter+1))
                break
        
        # remove the "virtual Hessian" observations if needed:
        if num_bigiter_hess > 0 and ind_bigiter == num_bigiter_hess:
            R_all = np.delete(R_all,range(2*D),0)
            E_all = np.delete(E_all,range(2*D),0)
            G_all = np.delete(G_all,range(2*D),0)
        
        # update the GP model and optimize the hyperparameters:
        if actdist_fro < np.inf:
            new_act = utils_atomic.update_active_fro(conf_info,conf_info_inactive,R[1:(N_im-1),:],actdist_fro)
            if new_act > 0:
                print('More frozen atoms activated. Now {:g} active and {:g} inactive frozen atoms.\n'.format(conf_info['conf_fro'].shape[0],conf_info_inactive['conf_ifro'].shape[0]))
                gp_model.kern.sum.RBF_atomic.conf_info = conf_info
        if ind_bigiter < num_bigiter_hess:
            mean_y = np.mean(E_all[(2*D):,:])
            range_y = np.max(E_all[(2*D):,:])-np.min(E_all[(2*D):,:])
            range_x = np.max(utils_atomic.dist_at(R_all[(2*D):,:],R_all[(2*D):,:],conf_info,np.ones(conf_info['n_pt'])))
        else:
            mean_y = np.mean(E_all)
            range_y = np.max(E_all)-np.min(E_all)
            range_x = np.max(utils_atomic.dist_at(R_all,R_all,conf_info,np.ones(conf_info['n_pt'])))
        gp_model.kern.sum.bias.variance = mean_y**2
        mag_prior = GPy.priors.Gaussian(mu=0.0,sigma=range_y/3)
        mag_prior.domain = '_POSITIVE'
        gp_model.kern.sum.RBF_atomic.magnitude.set_prior(mag_prior)
        len_prior = GPy.priors.Gaussian(mu=0.0,sigma=range_x/3)
        len_prior.domain = '_POSITIVE'
        gp_model.kern.sum.RBF_atomic.lengthscale.set_prior(len_prior)
        if ind_bigiter+1 <= num_bigiter_initparam or conf_info['n_pt'] > gp_model.kern.sum.RBF_atomic.lengthscale.shape[0]:
            gp_model.kern.sum.RBF_atomic.magnitude = norm.ppf(0.75,0,range_y/3)
            gp_model.kern.sum.RBF_atomic.lengthscale = norm.ppf(0.75,0,range_x/3)*np.ones(conf_info['n_pt'])
        gp_model.set_XY([R_all]*(D+1),[E_all]+np.hsplit(G_all,D))
        gp_model.optimize(optimizer=opt)
        param_gp = np.vstack((param_gp,gp_model[:]))
        
        # update the estimated energy and gradient at the unevaluated images,
        # update the NEB forces and save the values to the gathering matrices/vector:
        EG_R = gp_model.predict_noiseless([R[np.nonzero(uneval)[0]]]*(D+1))[0]
        E_R[np.nonzero(uneval)[0]] = EG_R[:N_uneval,:]
        G_R[np.nonzero(uneval)[0]] = np.reshape(EG_R[N_uneval:,:],(D,N_uneval)).T
        F_R, maxG_R_perp, maxG_CI, i_CI = method_force(R,E_R,G_R,param_force,T_CIon_gp)
        E_R_ae = np.hstack((E_R_ae,E_R))
        maxG_R_perp_ae = np.hstack((maxG_R_perp_ae,maxG_R_perp))
        maxG_CI_ae = np.hstack((maxG_CI_ae,maxG_CI))		
        obs_at = np.hstack((obs_at,E_R_gp.shape[1]))	      
        if maxG_R_perp_ae[i_eval-1,-1] < smallest_acc_maxG:
            smallest_acc_maxG = maxG_R_perp_ae[i_eval-1,-1]
        if N_uneval == 0:
            if T_CIon_gp > 0:
                print('Accurate values after {:g} image evaluations (including all {:g} intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g}, maxG_CI = {:.3g} (image {:g}) \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,np.max(E_R_ae[:,-1]),np.max(maxG_R_perp_ae[:,-1]),np.min(maxG_R_perp_ae[:,-1]),maxG_CI_ae[-1],i_CI+1))
            else:
                print('Accurate values after {:g} image evaluations (including all {:g} intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g} \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,np.max(E_R_ae[:,-1]),np.max(maxG_R_perp_ae[:,-1]),np.min(maxG_R_perp_ae[:,-1])))
        else:
            if T_CIon_gp > 0:
                if uneval[i_CI,0] == 0:
                    print('Estimations after {:g} image evaluations (including {:g}/{:g} of the intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g}, maxG_CI = {:.3g} (image {:g}, accurate) \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,N_im-2,np.max(E_R_ae[:,-1]),np.max(maxG_R_perp_ae[:,-1]),np.min(maxG_R_perp_ae[:,-1]),maxG_CI_ae[-1],i_CI+1))
                else:
                    print('Estimations after {:g} image evaluations (including {:g}/{:g} of the intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g}, maxG_CI = {:.3g} (image {:g}, estimation) \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,N_im-2,np.max(E_R_ae[:,-1]),np.max(maxG_R_perp_ae[:,-1]),np.min(maxG_R_perp_ae[:,-1]),maxG_CI_ae[-1],i_CI+1))
            else:
                print('Estimations after {:g} image evaluations (including {:g}/{:g} of the intermediate images of the current path):\n maxE_R = {:.3g}, maxmaxG_R_perp = {:.3g}, minmaxG_R_perp = {:.3g} \n\n'.format(ind_bigiter+1,N_im-2-N_uneval,N_im-2,np.max(E_R_ae[:,-1]),np.max(maxG_R_perp_ae[:,-1]),np.min(maxG_R_perp_ae[:,-1])))
        
        # stop the algorithm if maximum number of outer iterations is reached:
        if ind_bigiter == num_bigiter:
            print('Stopped the algorithm: Maximum number of outer iterations ({:g}) reached.\n'.format(ind_bigiter))
            break
			 
        if visualize > 0:
            # visualize the true energy along the spline interpolation of the relaxed path:
            plt.figure(1)
            cs = CubicSpline(np.arange(0,N_im)/(N_im-1),R)
            R_spline = cs(csr)
            E_spline, G_spline = pot_general(R_spline)
            E_spline = E_spline - Elevel
            E_images, G_images = pot_general(R)
            E_images = E_images - Elevel
            plt.plot(csr*(N_im-1)+1,E_spline,'r')
            plt.plot(np.arange(1,N_im+1),E_images,'o',MarkerEdgeColor='r',MarkerFaceColor='r')
        
        # If the (overptimistic) GP estimate for the maximum component of NEB forces is over the
        # final convergence threshold, relaxation is started on the estimated GP surface.
        # Otherwise, evaluate the climbing image without moving the path if not already evaluated.
        # If the climbing image has already been evaluated and the additional convergence threshold for CI has been reached,
        # evaluate more images without moving the path, otherwise relax the path and re-evaluate the climbing image.
        eval_next_i = 0
        if np.max(maxG_R_perp_ae[:,-1]) >= T_MEP:
            start_relax = 1
            eval_next_CI = 0
        else:
            if uneval[i_CI,0] > 0:
                start_relax = 0
                eval_next_CI = 1
            else:
                if maxG_CI_ae[-1] < T_CI:
                    start_relax = 0
                    eval_next_CI = 0
                else:
                    start_relax = 1
                    eval_next_CI = 1

        # START A RELAXATION PHASE IF DESIRED
        if start_relax > 0:
            
            # define the convergence threshold for the relaxation phase:
            if divisor_T_MEP_gp > 0:
                # if this option is set on, the GP convergence threshold is 1/'divisor_T_MEP_gp'
                # of the smallest accurate 'maxG_R_perp' obtained so far on any of the intermediate images,
                # but not less than 1/10 of the lowest final threshold
                T_MEP_gp = max((smallest_acc_maxG/divisor_T_MEP_gp,np.min((T_MEP/10,T_CI/10))))
            else:
                # otherwise the GP convergence threshold is always 1/10 of the lowest final threshold
                T_MEP_gp = min((T_MEP,T_CI))/10
            
            # define the start path for the relaxation phase:
            if islarge_num_iter > 0 or R_previous_equal.shape[0] <= 0:
                if ind_bigiter+1 > num_bigiter_initpath and R_latest_equal.shape[0] > 0:
                    R = R_latest_equal.copy()
                    if T_CIon_gp > 0:
                        print('Started relaxation phase on round {:g} from the latest "preliminarily converged" evenly spaced path (no climbing image).\n'.format(ind_bigiter+1))
                    else:
                        print('Started relaxation phase on round {:g} from the latest converged path.\n'.format(ind_bigiter+1))
                else:
                    R = R_init.copy()
                    print('Started relaxation phase on round {:g} from the initial path.\n'.format(ind_bigiter+1))
            else:
                R = R_previous_equal.copy()
                print('Started relaxation phase on round {%g} where the previous one stopped.\n'.format(ind_bigiter+1))
                R_previous_equal = np.ndarray(shape=(0,D))
            
            # set climbing image mode off in the beginning:
            CI_on = 0
            # velocities of the intermediate images (given as an output of the previous step):
            V_old = np.zeros((N_im-2,D))
            # NEB forces on the intermediate images on the previous path:
            F_R_old = np.zeros((N_im-2,1))
            # indicator if zero velocity used (for the first iteration):
            zeroV = 1
            
            # INNER ITERATION LOOP
            for ind_iter in range(num_iter+1):
                
                # calculate estimated energy and gradient on the new path:
                EG_R = gp_model.predict_noiseless([R]*(D+1))[0]
                E_R = EG_R[:N_im,:].copy()
                G_R = np.reshape(EG_R[N_im:,:],(D,N_im)).T.copy()
                F_R, maxG_R_perp, maxG_CI, i_CI = method_force(R,E_R,G_R,param_force,T_CIon_gp) 
                maxF_R = np.max(np.abs(F_R),1)[np.newaxis].T
                 
                # turn climbing image option on and correct the NEB force accordingly if sufficiently relaxed:
                if CI_on <= 0 and np.max(maxF_R) < T_CIon_gp:
                    R_latest_equal = R.copy()
                    CI_on = 1
                    i_CI_test = np.argmax(E_R[1:-1,:])+1
                    print('Climbing image (image {:g}) turned on after {:g} inner iterations.\n'.format(i_CI_test+1,ind_iter))
                    if islarge_num_iter > 0 or R_previous_climb.shape[0] <= 0:
                        if ind_bigiter+1 > num_bigiter_initpath and R_latest_climb.shape[0] > 0:
                            R_start_climb = R_latest_climb.copy()
                            R_start_climb_text = 'latest converged'
                            i_CI_start = i_CI_latest.copy()
                        else:
                            R_start_climb = np.ndarray(shape=(0,D))
                            i_CI_start = 0
                    else:
                        R_start_climb = R_previous_climb.copy()
                        R_start_climb_text = 'previous'
                        R_previous_climb = np.ndarray(shape=(0,D))
                        i_CI_start = i_CI_previous.copy()
                        i_CI_previous = 0
                    if i_CI_test == i_CI_start:
                        EG_R_test2 = gp_model.predict_noiseless(Xnew=[R_start_climb]*(D+1))[0]
                        E_R_test2 = EG_R_test2[:N_im,:].copy()
                        i_CI_test2 = np.argmax(E_R_test2[1:-1,:])+1
                        if i_CI_test2 == i_CI_start:
                            R = R_start_climb.copy()
                            E_R = E_R_test2.copy()
                            G_R = np.reshape(EG_R_test2[N_im:,:],(D,N_im)).T.copy()
                            print('CI unchanged: continued from the {} converged CI-path.\n'.format(R_start_climb_text))
                    F_R, maxG_R_perp, maxG_CI, i_CI = method_force(R,E_R,G_R,param_force,CI_on)
                    maxF_R = np.max(np.abs(F_R),1)[np.newaxis].T
                    zeroV = 1
                    
                E_R_gp = np.hstack((E_R_gp,E_R))
                maxF_R_gp = np.hstack((maxF_R_gp,maxF_R))
                maxG_CI_gp = np.hstack((maxG_CI_gp,maxG_CI))
                
                # stop the relaxation phase if converged:
                if ( T_CIon_gp <= 0 or CI_on > 0 ) and np.max(maxF_R) < T_MEP_gp and ind_iter > 0:
                    if CI_on > 0:
                        R_latest_climb = R.copy()
                        i_CI_latest = i_CI.copy()
                        print('Stopped the relaxation phase: converged after {:g} inner iterations (CI: image {:g}).\n'.format(ind_iter,i_CI+1))
                        R_previous_climb = np.ndarray(shape=(0,D))
                        i_CI_previous = 0
                    else:
                        R_latest_equal = R.copy()
                        print('Stopped the relaxation phase: converged after {:g} inner iterations.\n'.format(ind_iter))
                    break
                    
                # stop the relaxation phase if maximum number of inner iterations reached:
                if ind_iter == num_iter:
                    if islarge_num_iter <= 0:
                        if CI_on > 0:
                            R_previous_climb = R.copy()
                            i_CI_previous = i_CI.copy()
                        else:
                            R_previous_equal = R.copy()
                            R_previous_climb = np.ndarray(shape=(0,D))
                            i_CI_previous = 0
                    print('Stopped the relaxation phase: maximum number of inner iterations ({:g}) reached.\n'.format(ind_iter))
                    break
                
                # move the path one step along the NEB force according to the chosen method:
                R_new, V_old = method_step(R,F_R,param_step,F_R_old,V_old,zeroV)
                zeroV = 0
				
                if actdist_fro < np.inf:
                    # check if new active frozen atoms and update 'conf_info' and 'conf_info_inactive':
                    new_act = utils_atomic.update_active_fro(conf_info,conf_info_inactive,R_new[1:(N_im-1),:],actdist_fro)
                    # if new active frozen atoms, update the GP model and reoptimize hyperparameters:
                    if new_act > 0:
                        print('More frozen atoms activated. Now {:g} active and {:g} inactive frozen atoms.\n'.format(conf_info['conf_fro'].shape[0],conf_info_inactive['conf_ifro'].shape[0]))
                        gp_model.kern.sum.RBF_atomic.conf_info = conf_info
                        if ind_bigiter < num_bigiter_hess:
                            range_x = np.max(utils_atomic.dist_at(R_all[(2*D):,:],R_all[(2*D):,:],conf_info,np.ones(conf_info['n_pt'])))
                        else:
                            range_x = np.max(utils_atomic.dist_at(R_all,R_all,conf_info,np.ones(conf_info['n_pt'])))
                        len_prior = GPy.priors.Gaussian(mu=0.0,sigma=range_x/3)
                        len_prior.domain = '_POSITIVE'
                        gp_model.kern.sum.RBF_atomic.lengthscale.set_prior(len_prior)
                        if ind_bigiter+1 <= num_bigiter_initparam or conf_info['n_pt'] > gp_model.kern.sum.RBF_atomic.lengthscale.shape[0]:
                            gp_model.kern.sum.RBF_atomic.lengthscale = norm.ppf(0.75,0,range_x/3)*np.ones(conf_info['n_pt'])
                        gp_model.set_XY([R_all]*(D+1),[E_all]+np.hsplit(G_all,D))
                        gp_model.optimize(optimizer=opt)
                        zeroV = 1
                
                # limit the move if step length is larger than 99 % of 'disp_max' times the length of the initial path or
                # if any atom-wise step length is larger than 99 % of 0.5*(1-'ratio_at_limit') times the minimum inter-atomic distance:
                steplength = np.sqrt(np.sum((R_new[1:(N_im-1),:]-R[1:(N_im-1),:])**2,1))
                steplength_atomwise = np.sqrt((R_new[1:(N_im-1),0::3]-R[1:(N_im-1),0::3])**2+(R_new[1:(N_im-1),1::3]-R[1:(N_im-1),1::3])**2+(R_new[1:(N_im-1),2::3]-R[1:(N_im-1),2::3])**2)
                steplength_atomwise_limit = 0.5*(1.0-ratio_at_limit)*utils_atomic.mindist_interatomic(R[1:(N_im-1),:],conf_info)
                if any(steplength > 0.99*disp_max) or np.any(steplength_atomwise > 0.99*steplength_atomwise_limit):
                    step_coeff = np.min((np.ones(N_im-2),0.99*disp_max*scale/steplength),0)
                    step_coeff = np.min((step_coeff,0.99*np.min(steplength_atomwise_limit/steplength_atomwise,1)),0)
                    print('Warning: the step length of inner iteration {:g} limited.\n'.format(ind_iter+1))
                    R_new[1:(N_im-1),:] = R[1:(N_im-1),:] + step_coeff[:,None]*(R_new[1:(N_im-1),:]-R[1:(N_im-1),:])
                    zeroV = 1

                # STOPPING CRITERION FOR INTER-ATOMIC DISTANCES
                # reject the step and stop the relaxation phase if the following does not hold for some of the current images:
                # there is an observed data point so that all inter-atomic distances of the current image are more than 'ratio_at_limit'
                # (by default 2/3) but less than 1/'ratio_at_limit' (3/2) times the corresponding inter-atomic distance of the observed data point,
                # i.e., |log(r_im/r_nearobs)| < |log(ratio_at_limit)| ( = |log(2/3)| = 0.4055 )
                disp1D_nearest = np.min(utils_atomic.dist_max1Dlog(R_new[1:(N_im-1),:],R_all,conf_info),1)
                if np.max(disp1D_nearest) > abs(np.log(ratio_at_limit)):
                    eval_next_i = np.argmax(disp1D_nearest)+1
                    print('Stopped the relaxation phase after {:g} inner iterations: inter-atomic distance in image {:g} changes too much compared to "nearest" observed data point.\n'.format(ind_iter,eval_next_i+1))
                    if T_CIon_gp > 0:
                        R_previous_climb = np.ndarray(shape=(0,D))
                        i_CI_previous = 0
                    break
            
                #### STOPPING CRITERION FOR JOINT MOVEMENT OF ATOMS (OPTIONAL)
                #### reject the step and stop the relaxation phase if, for some of the current images, there does not exist
                #### an observed data point that fulfils the following requirement:
                #### for all moving atoms, the change in the position of the atom between the current image and
                #### the observed data point is not more than 1/2 of the distance from the atom to its nearest
                #### neighbour atom in the current image or the observed data point
                ###dispmaxrel_nearest = np.min(utils_atomic.dist_maxrel_atomwise3(R_new[1:(N_im-1),:],R_all,conf_info),1)
                ###if np.max(dispmaxrel_nearest) > 0.5:
		        ###    eval_next_i = np.argmax(dispmaxrel_nearest)+1
                ###    print('Stopped the relaxation phase after {:g} inner iterations: atom position in image {:g} changes too much compared to "nearest" observed data point.\n'.format(ind_iter,eval_next_i+1))
                ###    if T_CIon_gp > 0:
                ###        R_previous_climb = np.ndarray(shape=(0,D))
                ###        i_CI_previous = 0
                ###    break
                
                #### ALTERNATIVE STOPPING CRITERION FOR JOINT MOVEMENT OF ATOMS (OPTIONAL)
                #### reject the step and stop the relaxation phase if, for some of the current images, there does not exist
                #### an observed data point that fulfils the following requirement:
                #### for all moving atoms, the change in the position of the atom between the current image and
                #### the observed data point is not more than 1/4 of the distance from the atom to its nearest
                #### neighbour atom in the observed data point
                ###dispmaxrel_nearest = np.min(utils_atomic.dist_maxrel_atomwise2(R_new[1:(N_im-1),:],R_all,conf_info),1)
                ###if np.max(dispmaxrel_nearest) > 0.25:
                ###    eval_next_i = np.argmax(dispmaxrel_nearest)+1
		        ###    print('Stopped the relaxation phase after {:g} inner iterations: atom position in image {:g} changes too much compared to "nearest" observed data point.\n'.format(ind_iter,eval_next_i+1))
                ###    if T_CIon_gp > 0:
                ###        R_previous_climb = np.ndarray(shape=(0,D))
                ###        i_CI_previous = 0                
                ###    break
                
                # THE OLD STOPPING CRITERION FOR RAW DISPLACEMENT
                # reject the step and stop the relaxation phase if the distance from any current image to the
                # nearest observed data point is larger than 'disp_max' times the length of the initial path:
                disp_nearest = np.zeros((N_im-2,1))
                for i in range(1,N_im-1):
                    disp_nearest[i-1,0] = np.sqrt(np.min(np.sum(np.square(R_new[i,:]-R_all),1)))
                if np.max(disp_nearest) > disp_max*scale:
                    eval_next_i = np.argmax(disp_nearest)+1
                    print('Stopped the relaxation phase after {:g} inner iterations: image {:g} too far from the nearest observed data point.\n'.format(ind_iter,eval_next_i+1))
                    if T_CIon_gp > 0:
                        R_previous_climb = np.ndarray(shape=(0,D))
                        i_CI_previous = 0 
                    break
                
                # otherwise accept the step and continue the relaxation:
                R = R_new.copy()
                if quatern > 0:
                    for im in range(1,N_im-1):
                        R[im,:] = utils.doRotation(R[im,:],R[im-1,:])
                F_R_old = F_R.copy()
                
            # END OF INNER ITERATION LOOP
            
            uneval = np.vstack((0,np.ones((N_im-2,1),dtype=np.int),0))
            N_uneval = np.sum(uneval)
            E_R = np.vstack((E_min1,np.zeros((N_im-2,1)),E_min2))
            G_R = np.vstack((G_min1,np.zeros((N_im-2,D)),G_min2))
        
        # END OF THE RELAXATION PHASE
            
        if save_file:
            ##### NOTICE: IMPLEMENT HERE SAVING DATA TO FILE !!!
            print('ERROR: SAVING DATA TO FILE NOT IMPLEMENTED!')
            #save(save_file)

    # END OF OUTER ITERATION LOOP
    
    if visualize > 0:
        # visualize the true energy along the spline interpolation of the final path:
        plt.figure(1)
        cs = CubicSpline(np.arange(0,N_im)/(N_im-1),R)
        R_spline = cs(csr)
        E_spline, G_spline = pot_general(R_spline)
        E_spline = E_spline - Elevel
        E_images, G_images = pot_general(R)
        E_images = E_images - Elevel
        plt.plot(csr*(N_im-1)+1,E_spline,'b',LineWidth=2)
        plt.plot(np.arange(1,N_im+1),E_images,'o',MarkerEdgeColor='b',MarkerFaceColor='b')
        
    return R, E_R, G_R, i_CI, gp_model, R_all, E_all, G_all, Elevel, obs_at, E_R_ae, E_R_gp, maxG_R_perp_ae, maxF_R_gp, maxG_CI_ae, maxG_CI_gp, param_gp, figs
