Contact Network

Contents:

Contact Network: a graph \(G\) that captures the contacts between individuals.
Edges: the edges of the contact network
Meta Data: information describing network specifics
Encoding: EpiHiper supports a binary format and an ASCII format.
Examples
Partitioning

Introduction

Synopsis

Contact Network: a graph \(G\) that captures the contacts between individuals.

The EpiHiper contact network \(G\) is a graph that captures the contacts between individuals. It captures the specifics of those contacts as listed in Table 6. Edges are directed and endpoints are referred to as sources and targets. The edges in the network must be sorted by targetPID.

Remark: Note again that \(G\) only captures contacts and their durations; it does not capture when these happen within an iteration. It is one of the assumptions of the model that order of contacts within an iteration does not matter.

We point out that there is no aggregation of edges in the construction of the contact network. Whereas there may be pairs of individuals in contact at different times with precisely the same characteristics at those times (e.g., household members in contact at beginning and end of days), those contacts will be represented by separated edges; the contacts will not be combined to a single edge with the accumulated time of contact. The EpiHiper transmission process handles the two cases to make them equivalent.

Remark: Isolated vertices have no bearing on the disease dynamics and are not represented in the EpiHiper networks.

Edges

Synopsis

Edges: the edges of the contact network

To define the edges of the contact network, the following syntax is used:

edge: targetPID targetActivity sourcePID sourceActivity duration
      [locationID] [edgeTrait] [active] [weight]

Each edge in the contact network has the attributes given in Table 6. Attributes

Table 6 Edge properties in the EpiHiper network format.
Name	Type	Description
targetPID	\(n \in \mathbb{N}_0\)	The PID of the target node
targetActivity	activityTrait	The activity of the target node at time of contact
sourcePID	\(n \in \mathbb{N}_0\)	The PID of the source node
sourceActivity	activityTrait	The activity of the source node at time of contact
duration	\(n \in \mathbb{N}_0\)	Duration of contact in unit number-of-subticks
[locationID]	\(n \in \mathbb{N}_0\)	The location ID where the contact takes place
[edgeTrait]	edgeTrait	A per-network customizable bit-field of edge parameters
[active]	Boolean	A Boolean value specifying if the given edge is active
[weight]	\(0 \le x \in \mathbb{R}\)	A weight given to each edge.

Meta Data

Synopsis

Meta Data: information describing network specifics

To define the meta data of the contact network, the following syntax is used:

: encoding accumulationTime timeResolution numberOfNodes numberOfEdges
  sizeofPID sizeofActivity activityEncoding sizeofEdgeTrait traitEncoding
  hasActiveField hasWeightField hasLocationIDField [annotation]

Table 7 List of meta data attributes
JSON property	Description
encoding	binary or text
accumulationTime	An annotation string specifying the duration of network accumulation (default 24 hours)
timeResolution	The maximal value of the duration field of the network edges; captures the resolution used in the network accumulation per tick.
numberOfNodes	The number of nodes in the network
numberOfEdges	The number of edges in the network
sizeofPID	The size of the PIDs measured in bytes
sizeofActivity	The size of the activities measured in bytes (currently 4)
activityEncoding	JSON trait for encoding of activity type
sizeofEdgeTrait	The size of the edgeTrait measured in bytes (currently 0 or 4)
traitEncoding	JSON trait for encoding of edge features
hasActiveField	Boolean flag stating if active is included as edge field
hasWeightField	Boolean flag stating if weight is included as edge field
hasLocationIDField	Boolean flag stating if a location ID is included as edge field
ann:*	annotation for the network

Encoding

Synopsis

Encoding: EpiHiper supports a binary format and an ASCII format.

EpiHiper supports a binary format and an ASCII format, both of which have common meta data. This appears as the first line of the file in both formats; it is a standardized JSON network with all newline characters and redundant whitespace characters omitted. Furthermore the second line contains the column headers in both format. Optional attributes ([…]) are omitted when encoding the contact network.

sourcePID,sourceActivity,targetPID,targetActivity,duration
[,locationID][,edgeTrait][,active][,weight]

Table 8 Edge attribute encoding. In both ASCII and binary format the order of the fields is the same as the top-to-bottom order listed in the table
Name	Binary	Text
targetPID	size_t	\(n \in \mathbb{N}_0\)
targetActivity	bitset<32>	trait encoding
sourcePID	size_t	\(n \in \mathbb{N}_0\)
sourceActivity	bitset<32>	trait encoding
duration	double	\(0 \le x \in \mathbb{R}\)
[locationID]	size_t	\(n \in \mathbb{N}_0\)
[edgeTrait]	bitset<32>	trait encoding
[active]	bool	(0 or 1)
[weight]	double	\(0 \le x \in \mathbb{R}\)

ASCII format: After removal of the first line of the file (the common header line), the remaining file is a valid CSV file
Binary format: To avoid string interpretation and thus speed up loading of the network EpiHiper supports binary edge encoding. The order of the attributes is the same as in the csv file. Note, that due to data alignment in C the size of the binary encoded edge will be larger than the sum of the attribute sizes.

Examples

JSON graph header (first row) formatted for better readability:

{
  "$schema": "https://raw.githubusercontent.com/NSSAC/EpiHiper-Schema/master/schema/networkSchema.json",
  "epiHiperSchema": "https://raw.githubusercontent.com/NSSAC/EpiHiper-Schema/master/schema/networkSchema.json",
  "ann:label": "Wyoming(2017) - config_min_5_max_100_alpha_400 Wednesday network",
  "encoding": "text",
  "accumulationTime": "24 hours",
  "timeResolution": 86400,
  "numberOfNodes": 544276,
  "numberOfEdges": 27747598,
  "sizeofPID": 8,
  "sizeofActivity": 4,
  "activityEncoding": {
    "id": "activityTrait",
    "features": [
      {
        "id": "activityType",
        "default": "other",
        "enums": [
          {
            "id": "home"
          },
          {
            "id": "work"
          },
          {
            "id": "shop"
          },
          {
            "id": "other"
          },
          {
            "id": "school"
          },
          {
            "id": "college"
          },
          {
            "id": "religion"
          }
        ]
      }
    ]
  },
  "sizeofEdgeTrait": 0,
  "edgeTraitEncoding": {
    "id": "edgeTrait",
    "features": []
  },
  "hasLocationIDField": true,
  "hasActiveField": false,
  "hasWeightField": false
}

Text encoding starting with row 2 (csv column headers) of text encoding:

targetPID,targetActivity,sourcePID,sourceActivity,duration,LID
0,1:2,10105,1:2,900,7692
0,1:2,10905,1:2,1800,7692
0,1:2,11094,1:2,6840,7692
0,1:2,11134,1:2,1800,7692

Partitioning

The network for EpiHiper may be partitioned prior to computation. In fact for large networks an existing partition for the number of compute processes and/or threads is preferred. The format of a network partition is identical to the network except that includes additional information about the partition in the JSON header network.

To define the meta data of the partition, the following syntax is added:

partition:   numberOfNodes numberOfEdges numberOfParts firstLocalNode beyondLocalNode

Table 9 List of meta data attributes
JSON property	Description
numberOfNodes	number of nodes in the partition
numberOfEdges	number of edges in the partition
numberOfParts	the total number of parts of the partition
firstLocalNode	`targetPID` of the first node in the partition
beyondLocalNode	`targetPID` of the last node incremented by 1

The naming convention for the part of the partition is file name of the unpartitioned network append with .N where \(N\) is the index of the part starting with zero. Edges in a partition must be sorted by targetPID and the ranges \([firstLocalNode, beyondLocalNode)\) must be non overlapping and increasing with index.