# Tutorial

The simplest use for the STG tools is to extract, store and compare ABI
representations.

This tutorial uses long options throughout. Equivalent short options can be
found in the manual pages for [`stg`](stg.md) and [`stgdiff`](stgdiff.md). Both
tools understand `-` as a shorthand for `/dev/stdout`.

<details>
<summary>Working Example - code and compilation</summary>

This small code sample will be used as a working example. Copy it into a file
called `tree.c`.

```c
struct N {
  struct N * left;
  struct N * right;
  int value;
};

unsigned int count(struct N * tree) {
  return tree ? count(tree->left) + count(tree->right) + 1 : 0;
}

int sum(struct N * tree) {
  return tree ? sum(tree->left) + sum(tree->right) + tree->value : 0;
}
```

Compile it:

```shell
gcc -Wall -Wextra -g -c tree.c -o tree.o
```

</details>

## Extraction from ELF / DWARF

`stg` is the tool for extracting ABI representations, though it can do more
sophisticated things as well. The simplest invocation of `stg` looks something
like this:

```shell
stg --elf library.so --output library.stg
```

Adding the `--annotate` option can be useful, especially if trying to debug ABI
issues or when experimenting with the tools, like now.

If the output consists of just symbols and you get a warning about missing DWARF
information, this means that `library.so` has no DWARF debugging information.
For meaningful results, `stg` should be run on an *unstripped* ELF file which
may require build system adjustments.

<details>
<summary>Working Example - ABI extraction</summary>

Run this:

```shell
stg --elf tree.o --annotate --output -
```

And you should get something like this:

<details>
<summary>Output</summary>

```proto
version: 0x00000002
root_id: 0x84ea5130  # interface
pointer_reference {
  id: 0x32b38621
  kind: POINTER
  pointee_type_id: 0xe08efe1a  # struct N
}
primitive {
  id: 0x4585663f
  name: "unsigned int"
  encoding: UNSIGNED_INTEGER
  bytesize: 0x00000004
}
primitive {
  id: 0x6720d32f
  name: "int"
  encoding: SIGNED_INTEGER
  bytesize: 0x00000004
}
member {
  id: 0x35cbdb23
  name: "left"
  type_id: 0x32b38621  # struct N*
}
member {
  id: 0x0b440ffb
  name: "right"
  type_id: 0x32b38621  # struct N*
  offset: 64
}
member {
  id: 0xa06f75d5
  name: "value"
  type_id: 0x6720d32f  # int
  offset: 128
}
struct_union {
  id: 0xe08efe1a
  kind: STRUCT
  name: "N"
  definition {
    bytesize: 24
    member_id: 0x35cbdb23  # struct N* left
    member_id: 0x0b440ffb  # struct N* right
    member_id: 0xa06f75d5  # int value
  }
}
function {
  id: 0x912c02a7
  return_type_id: 0x6720d32f  # int
  parameter_id: 0x32b38621  # struct N*
}
function {
  id: 0xc2779f73
  return_type_id: 0x4585663f  # unsigned int
  parameter_id: 0x32b38621  # struct N*
}
elf_symbol {
  id: 0xbb237197
  name: "count"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0xc2779f73  # unsigned int(struct N*)
  full_name: "count"
}
elf_symbol {
  id: 0x4fdeca38
  name: "sum"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0x912c02a7  # int(struct N*)
  full_name: "sum"
}
interface {
  id: 0x84ea5130
  symbol_id: 0xbb237197  # unsigned int count(struct N*)
  symbol_id: 0x4fdeca38  # int sum(struct N*)
}
```

</details>

</details>

## Filtering

One issue when first starting to manage the ABI of a binary is the wish to
restrict the interface surface to just the necessary minimum. Any superfluous
symbols or type definitions in the ABI representation can result in spurious ABI
differences in reports later on.

When it comes to the symbols exposed, it's common to control symbol
*visibility*. Type definitions can be either exposed in public header files or
hidden in private header files, with perhaps only public forward declarations,
but this does not remove any type definitions in the DWARF information.

STG provides filtering facilities for both symbols and types, for example:

```shell
stg --files '*.h' --elf library.so --output library.stg
```

This will ensure that definitions of any types defined outside any header files,
and perhaps used as opaque pointer handles, are omitted from the ABI
representation. If you separate public and private headers, then use an
appropriate glob pattern that distinguishes the two.

Sets of symbol or file names can be read from a file. In this example, all
symbols whose names begin with `api_`, except those in the `obsolete` file, are
kept.

```shell
stg --symbols 'api_* & ! :obsolete' --elf library.so --output library.stg
```

For historical reasons, the literal filter file format is compatible with
libabigail's symbol list one, but this is subject to change.

```ini
[list]
 # one symbol per line
 foo # comments, whitespace and empty lines are all ignored
 bar

 baz
```

<details>
<summary>Working Example - filtering the ABI</summary>

Let's say that `struct N` is supposed to be an opaque type that user code only
gets pointers to and, additionally, the function `count` should be excluded from
the ABI (perhaps due to an argument over its return type). We can exclude the
definition of `struct N`, along with that of any other types defined in
`tree.c`, using a file filter. The symbol can be excluded by name.

Run this:

```shell
stg --elf tree.o --files '*.h' --symbols '!count' --output -
```

The result should be something like this:

<details>
<summary>Output</summary>

```proto
version: 0x00000002
root_id: 0x84ea5130
pointer_reference {
  id: 0x26944aa7
  kind: POINTER
  pointee_type_id: 0xb011cc02
}
primitive {
  id: 0x6720d32f
  name: "int"
  encoding: SIGNED_INTEGER
  bytesize: 0x00000004
}
struct_union {
  id: 0xb011cc02
  kind: STRUCT
  name: "N"
}
function {
  id: 0x9425f186
  return_type_id: 0x6720d32f
  parameter_id: 0x26944aa7
}
elf_symbol {
  id: 0x4fdeca38
  name: "sum"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0x9425f186
  full_name: "sum"
}
interface {
  id: 0x84ea5130
  symbol_id: 0x4fdeca38
}
```

</details>

</details>

## ABI Comparison

`stgdiff` is the tool for comparing ABI representations and reporting
differences, though it has some other, more specialised, uses. The simplest
invocation of `stgdiff` looks something like this:

```shell
stgdiff --stg old/library.stg new/library.stg --output -
```

This will report ABI differences in the default (`small`) format.

<details>
<summary>Working Example - ABI differences - small format</summary>

The function `sum` has a type that depends on `struct N`. Any change to either
might affect the ABI exposed via `sum`. For example, if the type of the `value`
member is changed to `short` and the file is recompiled, STG can detect this
difference.

First rerun the STG extraction, specifying `--output tree-old.stg`. Make the
source code change, recompile and extract the ABI with `--output tree-new.stg`.

Then run this:

```shell
stgdiff --stg tree-old.stg tree-new.stg --output -
```

To get this:

```text
type 'struct N' changed
  member changed from 'int value' to 'short int value'
    type changed from 'int' to 'short int'

```

</details>

The `small` format omits parts of the ABI graph which haven't changed.[^1] To
see all impacted nodes, use `--format flat` instead.

[^1]: The similarly named `short` format goes a bit further and will omit and
    summarise certain repetitive differences.

<details>
<summary>Working Example - ABI differences - flat format</summary>

```text
function symbol 'int sum(struct N*)' changed
  type 'int(struct N*)' changed
    parameter 1 type 'struct N*' changed
      pointed-to type 'struct N' changed

type 'struct N' changed
  member 'struct N* left' changed
    type 'struct N*' changed
      pointed-to type 'struct N' changed
  member 'struct N* right' changed
    type 'struct N*' changed
      pointed-to type 'struct N' changed
  member changed from 'int value' to 'short int value'
    type changed from 'int' to 'short int'

```

</details>

And if you really want to see more of the graph structure, use `--format plain`.

<details>
<summary>Working Example - ABI differences - plain format</summary>

```text
function symbol 'int sum(struct N*)' changed
  type 'int(struct N*)' changed
    parameter 1 type 'struct N*' changed
      pointed-to type 'struct N' changed
        member 'struct N* left' changed
          type 'struct N*' changed
            pointed-to type 'struct N' changed
              (being reported)
        member 'struct N* right' changed
          type 'struct N*' changed
            pointed-to type 'struct N' changed
              (being reported)
        member changed from 'int value' to 'short int value'
          type changed from 'int' to 'short int'

```

</details>

Or just use `--format viz` which generates input for
[Graphviz](https://graphviz.org/).

<details>
<summary>Working Example - ABI differences - viz format</summary>

```dot
digraph "ABI diff" {
  "0" [shape=rectangle, label="'interface'"]
  "1" [label="'int sum(struct N*)'"]
  "2" [label="'int(struct N*)'"]
  "3" [label="'struct N*'"]
  "4" [shape=rectangle, label="'struct N'"]
  "5" [label="'struct N* left'"]
  "5" -> "3" [label=""]
  "4" -> "5" [label=""]
  "6" [label="'struct N* right'"]
  "6" -> "3" [label=""]
  "4" -> "6" [label=""]
  "7" [label="'int value' → 'short int value'"]
  "8" [color=red, label="'int' → 'short int'"]
  "7" -> "8" [label=""]
  "4" -> "7" [label=""]
  "3" -> "4" [label="pointed-to"]
  "2" -> "3" [label="parameter 1"]
  "1" -> "2" [label=""]
  "0" -> "1" [label=""]
}
```

</details>
