# Automated Code Generation

```{caution}
The **sourcegen** utility is an experimental part of Cantera and may be changed without
notice.
```

Cantera's **sourcegen** utility is a Python based source generator for creating Cantera
interfaces for other languages. The following output options are supported:

- `clib`: [](clib-extensions); used to generate the [CLib API](../clib/index).
- `csharp`: [](dotnet-extensions); used to implement the [.NET API](../dotnet/index).
- `yaml`: [](yaml-extensions); simple illustration that summarizes the CLib interface
  in YAML format.

## Usage

The sourcegen utility is a command-line tool that is used for code generation. It can be
invoked without installation as

```bash
python -m interfaces.sourcegen.src.sourcegen <list-of-options>
```

from the root folder of the Cantera source code. For frequent usage, it is recommended
to install sourcegen into the same virtual environment as used by SCons (for example
the Conda environment used to compile Cantera from source) via:

```bash
python -m pip install -e interfaces/sourcegen
```

where the `-e` (or `--editable`) option ensures that changes to the sourcegen utility
take effect without a need to re-install. Running:

```shell
% sourcegen --help
```

displays the following help text:

```shell
usage: sourcegen [-h] [-v] [--api {clib,csharp,yaml}] [--output OUTPUT] [--root ROOT]

Source generator for creating Cantera interface code.

options:
  -h, --help            show this help message and exit
  -v, --verbose         show additional logging output
  --api {clib,csharp,yaml}
                        language of generated Cantera API code
  --output OUTPUT       specifies the OUTPUT folder name
  --root ROOT           specifies the Cantera source ROOT folder (default is '.')
```

## Overview

The sourcegen utility parses the XML tree generated by
[Doxygen](https://www.doxygen.org), using YAML configuration files that provide
instructions for constructing CLib interface functions from underlying C++ functions
and methods. For more information, see [](sourcegen-config).

The utility is used to generate the [CLib API](clib-extensions) itself, as well as
generated language interfaces built on top of it, such as the .NET interface and
others.

(sec-sourcegen-details)=
## Details

Automatic code generation involves initialization steps to resolve CLib interface
information using [](sourcegen-config). A subsequent scaffolding step delegates the
source generation to a language-specific sub-package.

1. **Parse Header File Specifications:**

   The commandline utility relies on a `HeaderFileParser` object that parses
   [](sec-sourcegen-specifications) and generates intermediate `HeaderFile` objects that
   represent individual CLib modules. The `HeaderFile` dataclass contains the following
   information:

   - `path`: Output folder.
   - `funcs`: List of functions to be scaffolded (initially empty).
   - `prefix`: Prefix used for CLib function names.
   - `base`: Base class of C++ methods (if applicable).
   - `parents`: List of C++ parent class(es).
   - `derived`: Dictionary of derived C++ class(es) and alternative prefixes.
   - `recipes`: List of header recipes read from YAML.
   - `docstring`: Lines representing docstring of YAML file.

   Each YAML specification file results in exactly one `HeaderFile` object.

1. **Resolve Recipes:**

   As a minimum, a [YAML Recipe](sec-sourcegen-recipes) specifies a `name` that either
   corresponds to a function within the `Cantera` namespace or a method or variable of
   the implemented base class. The `CLibSourceGenerator.resolve_tags` method is used
   to cross-reference individual recipes with known Doxygen tags. The information is
   used to detect the [CLib Function Type](sec-sourcegen-function-types) of a recipe and
   to generate a corresponding `CFunc` object that holds relevant CLib interface
   information used for subsequent scaffolding:

   - `ret_type`: Return type of CLib function.
   - `name`: Name of CLib function.
   - `arglist`: CLib function argument list.
   - `brief`: Brief description.
   - `wraps`: Implemented C++ function/method (if applicable).
   - `returns`: Description of returned value.
   - `base`: Qualified scope of function/method (if applicable).
   - `uses`: List of auxiliary C++ methods (if applicable).

   The information is used to update `recipe` list entries and build the `funcs` list
   for each `HeaderFile` object.

1. **Language-Specific Source Generation:**

   Each language-specific sub-package is required to export a class that derives from
   `SourceGenerator` and implement a `generate_source` method that takes a list of
   `HeaderFile` objects with their resolved recipes as an argument. The
   `generate_source` method uses this information to generate syntactically correct
   source code in the destination language.

   Each sub-package can contain a yaml-based config file named `config.yaml`. The core
   script recognizes two special keys:

   - `ignore_files`: a list of YAML specification file names\
     These files will be ignored entirely from source generation, for example because
     they cannot be parsed directly or contain functionality that is not planned for
     implementation in the destination language.
   - `ignore_funcs`: a mapping of YAML specification file names to lists of recipe
     names\
     The listed recipes contained within those files will not be scaffolded, for
     example because they cannot be translated automatically and need to be written by
     hand in the destination language.

   The config file may contain additional values for use by the language-specific
   sub-package.

Further processing of generated code depends on the build process of the
destination language.

```{tip}
The [YAML Source Generator](yaml-extensions) serves as an example to illustrate
code generation based on `HeaderFile` contents.
```
