(mlp-sscha)=

# Temperature dependent force constants calculation using pypolymlp and symfc

**This is an experimental feature, and its usage may change occasionally.**

```{note}
This feature is supported through the `phonopy-load` command but not the `phonopy`
command.
```

With the `--pypolymlp` option, phonopy can interface with the polynomial machine
learning potential (MLP) code,
[pypolymlp](https://github.com/sekocha/pypolymlp), to perform training and
evaluation tasks of MLPs. This feature aims to reduce the computational cost of
anharmonic force constant calculations by using MLPs as an intermediary layer,
efficiently representing atomic interactions. The example is found at
`example/KCl-SSCHA`.

The training process involves using a dataset consisting of supercell
displacements, forces, and energies. The trained MLPs are then employed to
compute forces for supercells with specific displacements.

For further details on combining phonopy calculations with pypolymlp, refer to
<u>A. Togo and A. Seko, J. Chem. Phys. **160**, 211001 (2024)</u>
[[doi](https://doi.org/10.1063/5.0211296)]
[[arxiv](https://arxiv.org/abs/2401.17531)].

Using the polynomial MLPs, stochastic self-consistent harmonic approximation
(SSCHA) calculation is performed in the following sections. By this, temperature
dependent force constants are calculated within SSCHA. About SSCHA, please refer
the paper by <u>L. Monacelli *et al.*, J. Phys.: Condens. Matter **33** 363001
(2021)</u> and <u>A. van Roekeghem A, *et al.*, Comput. Phys. Commun. **263**
107945 (2021)</u>. Technically, the computational procedure introduced here is
equivalent to the approach of the latter paper.

## Citation of pypolymlp

"Tutorial: Systematic development of polynomial machine learning potentials for
elemental and alloy systems", A. Seko, J. Appl. Phys. **133**, 011101 (2023)
[[doi](https://doi.org/10.1063/5.0129045)].

```
@article{pypolymlp,
  author = {Seko, Atsuto},
  title = "{"Tutorial: Systematic development of polynomial machine learning potentials for elemental and alloy systems"}",
  journal = {J. Appl. Phys.},
  volume = {133},
  number = {1},
  pages = {011101},
  year = {2023},
  month = {01},
}
```

## Citation of symfc

"Projector-based efficient estimation of force constants",
A. Seko and A. Togo, Phys. Rev. B, **110**, 214302 (2024) [[doi](https://doi.org/10.1103/PhysRevB.110.214302)]
[[arxiv](https://arxiv.org/abs/2403.03588)].

```
@article{PhysRevB.110.214302,
  title = {Projector-based efficient estimation of force constants},
  author = {Seko, Atsuto and Togo, Atsushi},
  journal = {Phys. Rev. B},
  volume = {110},
  issue = {21},
  pages = {214302},
  numpages = {18},
  year = {2024},
  month = {Dec},
}
```

## Requirements

- [pypolymlp](https://github.com/sekocha/pypolymlp) >= 0.4.6, < 0.9

  For linux (x86-64), a compiled package of pypolymlp can be installed via
  conda-forge (recommended). Otherwise, pypolymlp can be installed from
  source-code.

- [symfc](https://github.com/sekocha/pypolymlp) >= 1.1.7

## How to calculate

### Workflow

1. Generate random displacements in supercells. Use {ref}`--rd
   <random_displacements_tag>` option.
2. Calculate corresponding forces and energies in supercells. Use of VASP
   interface is recommended for {ref}`--sp <save_params_option>` option is
   supported.
3. Prepare dataset composed of displacements, forces, and energies in
   supercells. The dataset must be stored in a phonopy-yaml-like file, e.g.,
   `phonopy_params.yaml`. Use {ref}`-f <f_force_sets_option>` and {ref}`--sp
   <save_params_option>` option simultaneously.
4. Develop MLPs. By default, 90 and 10 percents of the dataset are used for the
   training and test, respectively. At this step `polymlp.yaml` is saved.
5. Generate random displacements in supercells
6. Evaluate MLPs for forces of the supercells generated in step 5.
7. Calculate force constants from displacement-force dataset from steps 5 and 6.
8. Temperature dependent force constants calculation

The steps 4-7 are executed in running phonopy with `--pypolymlp`
option.

### Steps 1-3: Dataset preparation

For the training, the following supercell data are required in the phonopy
setting to use pypolymlp:

- Displacements
- Forces
- Total energies

These data must be stored in phonopy.yaml-like file.

The supercells with displacements are generated by

```
% phonopy --pa auto --rd 1000 -c POSCAR-unitcell --dim 2 2 2 --amin 0.03 --amax 1.5
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.31.1

Compiled with OpenMP support (max 10 threads).
Python version 3.12.6
Spglib version 2.5.0

Crystal structure was read from "POSCAR-unitcell".
Unit of length: angstrom
Displacements creation mode
  Number of supercells with random displacements: 1000
  Min displacement distance: 0.03
  Max displacement distance: 1.5
Settings:
  Supercell: [2 2 2]
  Primitive matrix (Auto):
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

"phonopy_disp.yaml" and supercells have been created.

Summary of calculation was written in "phonopy_disp.yaml".
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|
```

For the generated supercells, forces and energies are calculated. Here it is
assumed to use the VASP code. Once the calculations are complete, the data
(forces and energies) can be extracted using the following command:

```bash
% phonopy --sp -f vasprun_xmls/vasprun-{001..120}.xml
```

This command extracts the necessary data and stores it in the
`phonopy_params.yaml` file. For more details, refer to the description of the
{ref}`--sp <save_params_option>` option. Currently, supercell energy extraction
from calculator outputs is only supported when using the VASP interface.

### Steps 4: Develop MLPs

Having `phonopy_params.yaml`, phonopy is executed with `--pypolymlp` option,

```
% phonopy-load phonopy_mlpsscha_params_KCl-120.yaml.xz --pypolymlp --mlp-params="ntrain=100, ntest=20"
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.34.1

-------------------------[time 2025-01-15 13:21:26]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phonopy.load mode.
Python version 3.12.6
Spglib version 2.5.0

Crystal structure was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Unit of length: angstrom
Settings:
  Supercell: [2 2 2]
  Primitive matrix:
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

NAC parameters were read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Displacement-force dataset was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
----------------------------- pypolymlp start ------------------------------
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Parameters:
  cutoff: 8.0
  model_type: 3
  max_p: 2
  gtinv_order: 3
  gtinv_maxl: (8, 8)
  gaussian_params1: (1.0, 1.0, 1)
  gaussian_params2: (0.0, 7.0, 10)
  ntrain: 100
  ntest: 20
Developing MLPs by pypolymlp...
Regression: cholesky decomposition ...
- alpha: 0.001
- alpha: 0.01
- alpha: 0.1
- alpha: 1.0
- alpha: 10.0
Clear training X.T @ X
Calculate X.T @ X for test data
Clear test X.T @ X
Regression: model selection ...
- alpha = 1.000e-03 : rmse (train, test) = 0.02432 0.23669
- alpha = 1.000e-02 : rmse (train, test) = 0.03613 0.16766
- alpha = 1.000e-01 : rmse (train, test) = 0.07193 0.22140
- alpha = 1.000e+00 : rmse (train, test) = 0.11563 0.26042
- alpha = 1.000e+01 : rmse (train, test) = 0.19375 0.31767
MLPs were written into "polymlp.yaml"
------------------------------ pypolymlp end -------------------------------
Generate displacements (--rd or -d) for proceeding to phonon calculations.
Dataset generated using MMLPs was written in "phonopy_mlp_eval_dataset.yaml".

Summary of calculation was written in "phonopy.yaml".
-------------------------[time 2025-01-15 13:22:42]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|
```

Information about the development of MLPs using pypolymlp is provided between
the `pypolymlp start` and `pypolymlp end` sections. The polynomial MLPs are
saved in the `polymlp.yaml` file. This file is automatically searched in
subsequent phonopy executions with the `--pypolymlp` option and reused.

#### Step 5-8: Temperature dependent force constants calculation

After the MLPs are developed, random displacements are generated with a
displacement distance of 0.001 Angstrom. The forces for these supercells are
then evaluated using pypolymlp. Both the generated displacements and the
corresponding forces are stored in the `phonopy_mlp_eval_dataset` file.
The calculated force constants may be refered as the harmonic force constants.


After the last step, the `polymlp.yaml` file exists in the current directory.
This file is read automatically in the next calculation with the `--pypolymlp`
option. If the developed MLPs can predict well forces at relatively large
displacements, temperature dependent force constants are calculated with the
`--sscha NUMBER_OF_ITERATIONS` option.

```
% phonopy-load phonopy_mlpsscha_params_KCl-120.yaml.xz --pypolymlp --sscha 10 --rd-temperature 300 --rd 1000
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.34.1

-------------------------[time 2025-01-19 17:44:21]-------------------------
Compiled with OpenMP support (max 10 threads).
Running in phonopy.load mode.
Python version 3.12.6
Spglib version 2.5.0

Crystal structure was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Unit of length: angstrom
Pypolymlp displacements creation mode
  Random displacements
  Number of supercells with random displacements: 1000
  Temperatuere to generate random displacements: 300.0
Settings:
  Supercell: [2 2 2]
  Primitive matrix:
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

NAC parameters were read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Displacement-force dataset was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
----------------------------- pypolymlp start ------------------------------
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Load MLPs from "polymlp.yaml".
------------------------------ pypolymlp end -------------------------------
Generate random displacements
  Twice of number of snapshots will be generated for plus-minus displacements.
  Displacement distance: 0.01
Evaluate forces in 2000 supercells by pypolymlp
-------------------------------- Symfc start -------------------------------
Symfc is a force constants calculator. See the following paper:
A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024).
Symfc is developed at https://github.com/symfc/symfc.
Computing [2] order force constants.
Increase log-level to watch detailed symfc log.
--------------------------------- Symfc end --------------------------------
Max drift of force constants: -0.000000 (yy) -0.000000 (yy)

------------------------------- SSCHA start --------------------------------
Use provided force constants.

[ SSCHA iteration 1 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.001, 0.085] ****
  [0.085, 0.168] *********************
  [0.168, 0.252] *******************************
  [0.252, 0.336] *************************
  [0.336, 0.419] *************
  [0.419, 0.503] ****
  [0.503, 0.587] *
  [0.587, 0.670]
  [0.670, 0.754]
  [0.754, 0.838]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.253 meV
SSCHA force constants are written into "phonopy_sscha_fc_1.yaml.xz".

[ SSCHA iteration 2 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.006, 0.084] ****
  [0.084, 0.161] ******************
  [0.161, 0.238] *****************************
  [0.238, 0.315] *************************
  [0.315, 0.392] ***************
  [0.392, 0.469] ******
  [0.469, 0.546] **
  [0.546, 0.623]
  [0.623, 0.701]
  [0.701, 0.778]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.345 meV
SSCHA force constants are written into "phonopy_sscha_fc_2.yaml.xz".

[ SSCHA iteration 3 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.005, 0.087] ****
  [0.087, 0.170] *********************
  [0.170, 0.252] *******************************
  [0.252, 0.335] *************************
  [0.335, 0.417] *************
  [0.417, 0.499] ****
  [0.499, 0.582] *
  [0.582, 0.664]
  [0.664, 0.747]
  [0.747, 0.829]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.301 meV
SSCHA force constants are written into "phonopy_sscha_fc_3.yaml.xz".

[ SSCHA iteration 4 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.006, 0.081] ****
  [0.081, 0.155] *****************
  [0.155, 0.230] ***************************
  [0.230, 0.305] *************************
  [0.305, 0.380] ****************
  [0.380, 0.455] *******
  [0.455, 0.530] **
  [0.530, 0.604] *
  [0.604, 0.679]
  [0.679, 0.754]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.252 meV
SSCHA force constants are written into "phonopy_sscha_fc_4.yaml.xz".

[ SSCHA iteration 5 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.004, 0.079] ****
  [0.079, 0.154] *****************
  [0.154, 0.230] ***************************
  [0.230, 0.305] **************************
  [0.305, 0.380] ****************
  [0.380, 0.455] *******
  [0.455, 0.530] **
  [0.530, 0.605] *
  [0.605, 0.680]
  [0.680, 0.755]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.403 meV
SSCHA force constants are written into "phonopy_sscha_fc_5.yaml.xz".

[ SSCHA iteration 6 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.003, 0.080] ****
  [0.080, 0.157] *****************
  [0.157, 0.234] ****************************
  [0.234, 0.311] **************************
  [0.311, 0.388] ***************
  [0.388, 0.465] *******
  [0.465, 0.542] **
  [0.542, 0.619] *
  [0.619, 0.695]
  [0.695, 0.772]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.145 meV
SSCHA force constants are written into "phonopy_sscha_fc_6.yaml.xz".

[ SSCHA iteration 7 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.009, 0.085] ****
  [0.085, 0.160] ******************
  [0.160, 0.236] ****************************
  [0.236, 0.311] *************************
  [0.311, 0.386] ****************
  [0.386, 0.462] *******
  [0.462, 0.537] **
  [0.537, 0.612] *
  [0.612, 0.688]
  [0.688, 0.763]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.276 meV
SSCHA force constants are written into "phonopy_sscha_fc_7.yaml.xz".

[ SSCHA iteration 8 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.006, 0.089] *****
  [0.089, 0.173] **********************
  [0.173, 0.256] *******************************
  [0.256, 0.339] ************************
  [0.339, 0.422] *************
  [0.422, 0.506] ****
  [0.506, 0.589] *
  [0.589, 0.672]
  [0.672, 0.756]
  [0.756, 0.839]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.274 meV
SSCHA force constants are written into "phonopy_sscha_fc_8.yaml.xz".

[ SSCHA iteration 9 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.009, 0.087] ****
  [0.087, 0.165] *******************
  [0.165, 0.243] *****************************
  [0.243, 0.321] *************************
  [0.321, 0.399] **************
  [0.399, 0.477] ******
  [0.477, 0.555] **
  [0.555, 0.633]
  [0.633, 0.712]
  [0.712, 0.790]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.193 meV
SSCHA force constants are written into "phonopy_sscha_fc_9.yaml.xz".

[ SSCHA iteration 10 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.004, 0.080] ***
  [0.080, 0.156] *****************
  [0.156, 0.232] ****************************
  [0.232, 0.308] **************************
  [0.308, 0.384] ****************
  [0.384, 0.460] *******
  [0.460, 0.536] **
  [0.536, 0.611] *
  [0.611, 0.687]
  [0.687, 0.763]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.270 meV
SSCHA force constants are written into "phonopy_sscha_fc_10.yaml.xz".

-------------------------------- SSCHA end ---------------------------------
----------------------------------------------------------------------------
 One of the following run modes may be specified for phonon calculations.
 - Mesh sampling (MESH, --mesh)
 - Q-points (QPOINTS, --qpoints)
 - Band structure (BAND, --band)
 - Animation (ANIME, --anime)
 - Modulation (MODULATION, --modulation)
 - Characters of Irreps (IRREPS, --irreps)
 - Create displacements (CREATE_DISPLACEMENTS, -d)
----------------------------------------------------------------------------

Summary of calculation was written in "phonopy.yaml".
-------------------------[time 2025-01-19 17:48:18]-------------------------
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|
```

The final force constants are stored in files named
`phonopy_sscha_fc_NUM.yaml.xz`, where `NUM` represents the integer corresponding
to the iteration step. By performing a sufficient number of SSCHA iterations and
utilizing a sufficiently large set of supercells with random displacements at a
given temperature, the SSCHA force constants can be reliably determined. The
convergence of these force constants can be monitored through the SSCHA free
energy. Additionally, convergence can be assessed by plotting the phonon band
structures corresponding to the SSCHA force constants at the iteration
steps. For example:

```
% for i in {0..10}; do phonopy-load phonopy_sscha_fc_$i.yaml.xz --band auto --band-points 101; mv band.yaml band-$i.yaml; done
% phonopy-bandplot band-{0..10}.yaml --legend
```

## Parameters for developing MLPs

A few parameters can be specified using the `--mlp-params` option for the
development of MLPs. The parameters are provided as a string, e.g.,

```bash
% phonopy-load phonopy_params.yaml --pypolymlp --mlp-params="ntrain=80, ntest=20"
```

Parameters are separated by commas for configuration. A brief explanation of the
available parameters can be found in the docstring of `PypolymlpParams` that is
found by

```python
In [1]: from phonopy.interface.pypolymlp import PypolymlpParams

In [2]: help(PypolymlpParams)
```

`ntrain` and `ntest` are implemented in phonopy, while the remaining parameters
are directly passed to pypolymlp. Optimizing pypolymlp parameters can be
difficult, both in terms of achieving accuracy and managing the computational
resources required. The current default parameters are likely suitable for
systems up to ternary compounds. For binary systems, the calculations can
generally be run on standard laptop computers, but for ternary systems, around
40 GB of memory or more may be necessary.

For parameter adjustments, it is recommended to consult the
[pypolymlp](https://github.com/sekocha/pypolymlp) documentation and review the
 relevant research papers.

### `ntrain` and `ntest`

This method provides a straightforward dataset split: the first `ntrain`
supercells from the list are used for training, while the last `ntest`
supercells are reserved for testing.

## Convergence with respect to dataset size

In general, increasing the amount of data improves the accuracy of representing
force constants. Therefore, it is recommended to check the convergence of the
target property with respect to the number of supercells in the training
dataset. Lattice thermal conductivity may be a convenient property to monitor
when assessing convergence.

For example, by preparing an initial set with 100 supercell data, calculations
can then be performed by varying the size of the training dataset while keeping
the test dataset unchanged as follows:

```bash
% phonopy-load --pypolymlp --mlp-params="ntrain=20, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-20
% phonopy-load --pypolymlp --mlp-params="ntrain=40, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-40
% phonopy-load --pypolymlp --mlp-params="ntrain=60, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-60
% phonopy-load --pypolymlp --mlp-params="ntrain=80, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-80
% phonopy-load --pypolymlp --mlp-params="ntrain=100, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-100
```

The computed phonon band structures are plotted against the size of the training
dataset to observe the frequency convergence. If it has not converged, an
additional set of supercell data (e.g., forces and energies in the next 100
supercells) will be computed and included. With this procedure in mind, it may
be convenient to generate a sufficiently large number of supercells with random
displacements in advance, such as 1000 supercells, before starting the
temperature dependent force constants calculation with pypolymlp.


## Converting `phonopy.pmlp` to `polymlp.yaml`

In older versions, polynomial MLPs were stored in `phonopy.pmlp`. This file can
be converted to `polymlp.yaml` using the following Python snippet.

```python
from pypolymlp.mlp_dev.pypolymlp import Pypolymlp
polymlp = Pypolymlp()
polymlp.convert_to_yaml(filename_txt="phonopy.pmlp", filename_yaml="polymlp.yaml”)
```
