Data Import — File Formats
TurboLynx loads datasets via the bulkload tool. Two file formats are supported: CSV and JSON.
CSV
Delimiter
Fields are separated by a pipe character (|), not a comma.
Header Row
The first row is always a header. Each column has the form:
The header annotation drives schema inference — there is no separate schema file.
ID Column Annotations
Special column annotations identify vertex IDs and edge endpoints.
Vertex ID — :ID(Label)
- The label inside
(...)is the vertex type name used for cross-referencing edge files. - The ID value must be a non-negative integer (
UBIGINTinternally). - Each vertex type has its own ID namespace; IDs only need to be unique within a type.
Compound (multi-column) vertex ID — :ID_1(Label) / :ID_2(Label)
When a vertex is identified by two columns, use _1 and _2 suffixes:
Edge source / destination — :START_ID(Label) / :END_ID(Label)
- Values must match IDs declared in the corresponding vertex file.
Property Column Types
| CSV type annotation | Internal type | Notes |
|---|---|---|
STRING |
VARCHAR | UTF-8 text |
STRING[] |
VARCHAR | Multi-value stored as plain text |
INT |
INTEGER | 32-bit signed integer |
INTEGER |
INTEGER | Alias for INT |
LONG |
BIGINT | 64-bit signed integer |
BIGINT |
BIGINT | Alias for LONG |
ULONG |
UBIGINT | 64-bit unsigned integer |
UBIGINT |
UBIGINT | Alias for ULONG |
FLOAT |
FLOAT | 32-bit IEEE 754 floating-point |
DOUBLE |
DOUBLE | 64-bit IEEE 754 floating-point |
BOOLEAN |
BOOLEAN | true / false (JSON only; not yet supported in CSV) |
DATE |
DATE | Calendar date — see Date format |
DATE_EPOCHMS |
DATE | Milliseconds since Unix epoch — see Epoch milliseconds |
DECIMAL(p,s) |
DECIMAL | Fixed-point — see Decimal format |
Date Format
Type annotation: DATE
Accepted input format: ISO 8601 date
Examples:
Epoch Milliseconds
Type annotation: DATE_EPOCHMS
The value is an integer representing milliseconds since the Unix epoch (1970-01-01 00:00:00 UTC). The parser divides the value by 1000 to obtain a Unix timestamp in seconds, then converts to a calendar date.
Note: Sub-second precision is truncated when converting to a date.
Timestamp Format
TurboLynx uses the TIMESTAMP type internally (microsecond resolution, stored as int64_t).
When a column is declared DATE_EPOCHMS, the raw integer milliseconds value is accepted.
For string-formatted timestamps (used in queries and future CSV extensions), the parser accepts ISO 8601 with the following rules:
| Component | Description |
|---|---|
YYYY-MM-DD |
Date part (required) |
T or |
Separator between date and time (either is accepted) |
HH:MM:SS |
Time part in 24-hour clock |
.mmm |
Optional milliseconds (1–3 digits) |
Z |
Optional UTC suffix |
+HH:MM / -HH:MM |
Optional UTC offset; offsets are subtracted to normalize to UTC |
Examples of valid timestamp strings:
2024-03-15 10:30:00
2024-03-15T10:30:00
2024-03-15T10:30:00.123
2024-03-15T10:30:00Z
2024-03-15T10:30:00+09:00
2024-03-15T10:30:00-05:30
A bare date (2024-03-15) is also valid and is interpreted as midnight UTC.
Decimal Format
Type annotation: DECIMAL(precision, scale)
precision— total number of significant digitsscale— number of digits to the right of the decimal point
Both .-separated and integer-only inputs are accepted.
The value is stored as a scaled integer (e.g., 12345.67 with scale 2 is stored as 1234567).
Null Values
An empty field is treated as NULL:
Alicerow:scoreis NULLBobrow:ageis NULL
Edge Files — Forward and Backward
TurboLynx stores two adjacency lists per edge type: one for forward traversal (start → end) and one for backward traversal (end → start). Both files must have the same property columns.
Forward file (:START_ID first):
Backward file (:END_ID first, rows sorted by the first column):
The backward file is the same data with the ID columns swapped and the rows re-sorted by the new first column (END_ID).
Convention: Name backward files with a
.backwardsuffix, e.g.,knows.csv→knows.csv.backward.
JSON
TurboLynx reads JSON files using the yyjson parser.
Top-level structure
A JSON file must be a single object with either a "vertices" key (for vertex files) or an "edges" key (for edge files), whose value is an array of objects.
Vertex file:
{
"vertices": [
{ "id": 1, "firstName": "Alice", "lastName": "Smith", "age": 30 },
{ "id": 2, "firstName": "Bob", "lastName": "Jones", "age": 25 }
]
}
Edge file:
Supported JSON value types
| JSON type | Mapped to |
|---|---|
boolean |
BOOLEAN |
integer |
INTEGER / BIGINT / UBIGINT |
number (float) |
FLOAT / DOUBLE |
string |
VARCHAR |
Note:
DECIMALis not supported in the JSON path.
Parser flags
The JSON reader is permissive: it allows Inf/NaN values and trailing commas.
Directory Layout
Place all vertex and edge files in a flat directory.
The bulkload tool scans the directory and infers file roles from the header annotations.
dataset/
├── person.csv ← vertex file (:ID annotation present)
├── comment.csv ← vertex file
├── person_knows_person.csv ← edge file (:START_ID / :END_ID present)
├── person_knows_person.csv.backward ← backward edge file
└── ...
File-type detection
| Header contains | Interpreted as |
|---|---|
:ID(...) |
Vertex file |
:START_ID(...) and :END_ID(...) |
Edge file (forward) |
:END_ID(...) appears first |
Edge file (backward) |
Running Import
./tools/turbolynx import \
--workspace /path/to/db \
--nodes Person data/person.csv \
--nodes Comment data/comment.csv \
--relationships KNOWS data/person_knows_person.csv
| Option | Description |
|---|---|
--workspace |
Directory where store.db and catalog.bin will be written |
--nodes <Label> <file> |
Vertex CSV file (repeatable) |
--relationships <Type> <file> |
Edge CSV file (repeatable) |
After a successful load the schema is persisted to <workspace>/catalog.bin and the graph data to <workspace>/store.db. Subsequent import runs will append to the existing store.
Complete Example
Vertex file — person.csv
:ID(Person)|firstName:STRING|lastName:STRING|age:INT|score:DOUBLE|joined:DATE
1|Alice|Smith|30|9.5|2022-01-15
2|Bob|Jones|25||2023-06-01