Contact Network
Contents:
Contact Network: a graph \(G\) that captures the contacts between individuals.
Encoding: EpiHiper supports a binary format and an ASCII format.
Introduction
Synopsis
Contact Network: a graph \(G\) that captures the contacts between individuals.
The EpiHiper contact network \(G\) is a graph that captures the contacts between individuals. It captures the specifics of those contacts as listed in Table 6. Edges are directed and endpoints are referred to as sources and targets. The edges in the network must be sorted by targetPID.
Remark: Note again that \(G\) only captures contacts and their durations; it does not capture when these happen within an iteration. It is one of the assumptions of the model that order of contacts within an iteration does not matter.
We point out that there is no aggregation of edges in the construction of the contact network. Whereas there may be pairs of individuals in contact at different times with precisely the same characteristics at those times (e.g., household members in contact at beginning and end of days), those contacts will be represented by separated edges; the contacts will not be combined to a single edge with the accumulated time of contact. The EpiHiper transmission process handles the two cases to make them equivalent.
Remark: Isolated vertices have no bearing on the disease dynamics and are not represented in the EpiHiper networks.
Edges
Synopsis
Edges: the edges of the contact network
To define the edges of the contact network, the following syntax is used:
edge: targetPID targetActivity sourcePID sourceActivity duration
[locationID] [edgeTrait] [active] [weight]
Each edge in the contact network has the attributes given in Table 6. Attributes
Name
|
Type
|
Description
|
|---|---|---|
targetPID
|
\(n \in \mathbb{N}_0\)
|
The PID of the target node
|
targetActivity
|
The activity of the target node at time of contact
|
|
sourcePID
|
\(n \in \mathbb{N}_0\)
|
The PID of the source node
|
sourceActivity
|
The activity of the source node at time of contact
|
|
duration
|
\(n \in \mathbb{N}_0\)
|
Duration of contact in unit number-of-subticks
|
[locationID]
|
\(n \in \mathbb{N}_0\)
|
The location ID where the contact takes place
|
[edgeTrait]
|
A per-network customizable bit-field of edge parameters
|
|
[active]
|
Boolean
|
A Boolean value specifying if the given edge is active
|
[weight]
|
\(0 \le x \in \mathbb{R}\)
|
A weight given to each edge.
|
Meta Data
Synopsis
Meta Data: information describing network specifics
To define the meta data of the contact network, the following syntax is used:
: encoding accumulationTime timeResolution numberOfNodes numberOfEdges
sizeofPID sizeofActivity activityEncoding sizeofEdgeTrait traitEncoding
hasActiveField hasWeightField hasLocationIDField [annotation]
JSON property
|
Description
|
|---|---|
encoding
|
binary or text
|
accumulationTime
|
An annotation string specifying the duration of network accumulation
(default 24 hours)
|
timeResolution
|
The maximal value of the duration field of the network edges; captures
the resolution used in the network accumulation per tick.
|
numberOfNodes
|
The number of nodes in the network
|
numberOfEdges
|
The number of edges in the network
|
sizeofPID
|
The size of the PIDs measured in bytes
|
sizeofActivity
|
The size of the activities measured in bytes (currently 4)
|
activityEncoding
|
JSON trait for encoding of activity type
|
sizeofEdgeTrait
|
The size of the edgeTrait measured in bytes (currently 0 or 4)
|
traitEncoding
|
JSON trait for encoding of edge features
|
hasActiveField
|
Boolean flag stating if active is included as edge field
|
hasWeightField
|
Boolean flag stating if weight is included as edge field
|
hasLocationIDField
|
Boolean flag stating if a location ID is included as edge field
|
ann:*
|
annotation for the network
|
Encoding
Synopsis
Encoding: EpiHiper supports a binary format and an ASCII format.
EpiHiper supports a binary format and an ASCII format, both of which have common meta data. This appears as the first line of the file in both formats; it is a standardized JSON network with all newline characters and redundant whitespace characters omitted. Furthermore the second line contains the column headers in both format. Optional attributes ([…]) are omitted when encoding the contact network.
sourcePID,sourceActivity,targetPID,targetActivity,duration
[,locationID][,edgeTrait][,active][,weight]
Name
|
Binary
|
Text
|
|---|---|---|
targetPID
|
size_t
|
\(n \in \mathbb{N}_0\)
|
targetActivity
|
bitset<32>
|
|
sourcePID
|
size_t
|
\(n \in \mathbb{N}_0\)
|
sourceActivity
|
bitset<32>
|
|
duration
|
double
|
\(0 \le x \in \mathbb{R}\)
|
[locationID]
|
size_t
|
\(n \in \mathbb{N}_0\)
|
[edgeTrait]
|
bitset<32>
|
|
[active]
|
bool
|
(0 or 1)
|
[weight]
|
double
|
\(0 \le x \in \mathbb{R}\)
|
- ASCII format
After removal of the first line of the file (the common header line), the remaining file is a valid CSV file
- Binary format
To avoid string interpretation and thus speed up loading of the network EpiHiper supports binary edge encoding. The order of the attributes is the same as in the csv file. Note, that due to data alignment in C the size of the binary encoded edge will be larger than the sum of the attribute sizes.
Examples
JSON graph header (first row) formatted for better readability:
{
"$schema": "https://raw.githubusercontent.com/NSSAC/EpiHiper-Schema/master/schema/networkSchema.json",
"epiHiperSchema": "https://raw.githubusercontent.com/NSSAC/EpiHiper-Schema/master/schema/networkSchema.json",
"ann:label": "Wyoming(2017) - config_min_5_max_100_alpha_400 Wednesday network",
"encoding": "text",
"accumulationTime": "24 hours",
"timeResolution": 86400,
"numberOfNodes": 544276,
"numberOfEdges": 27747598,
"sizeofPID": 8,
"sizeofActivity": 4,
"activityEncoding": {
"id": "activityTrait",
"features": [
{
"id": "activityType",
"default": "other",
"enums": [
{
"id": "home"
},
{
"id": "work"
},
{
"id": "shop"
},
{
"id": "other"
},
{
"id": "school"
},
{
"id": "college"
},
{
"id": "religion"
}
]
}
]
},
"sizeofEdgeTrait": 0,
"edgeTraitEncoding": {
"id": "edgeTrait",
"features": []
},
"hasLocationIDField": true,
"hasActiveField": false,
"hasWeightField": false
}
Text encoding starting with row 2 (csv column headers) of text encoding:
targetPID,targetActivity,sourcePID,sourceActivity,duration,LID
0,1:2,10105,1:2,900,7692
0,1:2,10905,1:2,1800,7692
0,1:2,11094,1:2,6840,7692
0,1:2,11134,1:2,1800,7692
Partitioning
The network for EpiHiper may be partitioned prior to computation. In fact for large networks an existing partition for the number of compute processes and/or threads is preferred. The format of a network partition is identical to the network except that includes additional information about the partition in the JSON header network.
To define the meta data of the partition, the following syntax is added:
partition: numberOfNodes numberOfEdges numberOfParts firstLocalNode beyondLocalNode
JSON property
|
Description
|
|---|---|
numberOfNodes
|
number of nodes in the partition
|
numberOfEdges
|
number of edges in the partition
|
numberOfParts
|
the total number of parts of the partition
|
firstLocalNode
|
targetPID of the first node in the partition |
beyondLocalNode
|
targetPID of the last node incremented by 1 |
The naming convention for the part of the partition is file name of the unpartitioned network append with .N where \(N\) is the index of the part starting with zero. Edges in a partition must be sorted by targetPID and the ranges \([firstLocalNode, beyondLocalNode)\) must be non overlapping and increasing with index.