Data Import — File Formats

TurboLynx loads datasets via the bulkload tool. Two file formats are supported: CSV and JSON.

CSV

Delimiter

Fields are separated by a pipe character (|), not a comma.

id:ID(Person)|firstName:STRING|lastName:STRING|age:INT
1|Alice|Smith|30
2|Bob|Jones|25

Header Row

The first row is always a header. Each column has the form:

columnName:TYPE

The header annotation drives schema inference — there is no separate schema file.

ID Column Annotations

Special column annotations identify vertex IDs and edge endpoints.

Vertex ID — `:ID(Label)`

:ID(Person)|name:STRING|age:INT
1|Alice|30
2|Bob|25

The label inside (...) is the vertex type name used for cross-referencing edge files.
The ID value must be a non-negative integer (UBIGINT internally).
Each vertex type has its own ID namespace; IDs only need to be unique within a type.

Compound (multi-column) vertex ID — `:ID_1(Label)` / `:ID_2(Label)`

When a vertex is identified by two columns, use _1 and _2 suffixes:

:ID_1(Order)|:ID_2(Order)|amount:DECIMAL(10,2)
100|2024|99.50

Edge source / destination — `:START_ID(Label)` / `:END_ID(Label)`

:START_ID(Person)|:END_ID(Person)|since:INT
1|2|2020

Values must match IDs declared in the corresponding vertex file.

Property Column Types

CSV type annotation	Internal type	Notes
`STRING`	VARCHAR	UTF-8 text
`STRING[]`	VARCHAR	Multi-value stored as plain text
`INT`	INTEGER	32-bit signed integer
`INTEGER`	INTEGER	Alias for `INT`
`LONG`	BIGINT	64-bit signed integer
`BIGINT`	BIGINT	Alias for `LONG`
`ULONG`	UBIGINT	64-bit unsigned integer
`UBIGINT`	UBIGINT	Alias for `ULONG`
`FLOAT`	FLOAT	32-bit IEEE 754 floating-point
`DOUBLE`	DOUBLE	64-bit IEEE 754 floating-point
`BOOLEAN`	BOOLEAN	`true` / `false` (JSON only; not yet supported in CSV)
`DATE`	DATE	Calendar date — see Date format
`DATE_EPOCHMS`	DATE	Milliseconds since Unix epoch — see Epoch milliseconds
`DECIMAL(p,s)`	DECIMAL	Fixed-point — see Decimal format

Date Format

Type annotation: DATE

Accepted input format: ISO 8601 date

YYYY-MM-DD

Examples:

createdAt:DATE
2024-03-15
1999-01-01

Epoch Milliseconds

Type annotation: DATE_EPOCHMS

The value is an integer representing milliseconds since the Unix epoch (1970-01-01 00:00:00 UTC). The parser divides the value by 1000 to obtain a Unix timestamp in seconds, then converts to a calendar date.

createdAt:DATE_EPOCHMS
1710460800000

Note: Sub-second precision is truncated when converting to a date.

Timestamp Format

TurboLynx uses the TIMESTAMP type internally (microsecond resolution, stored as int64_t). When a column is declared DATE_EPOCHMS, the raw integer milliseconds value is accepted.

For string-formatted timestamps (used in queries and future CSV extensions), the parser accepts ISO 8601 with the following rules:

YYYY-MM-DD[T| ]HH:MM:SS[.mmm][Z | ±HH[:MM]]

Component	Description
`YYYY-MM-DD`	Date part (required)
`T` or	Separator between date and time (either is accepted)
`HH:MM:SS`	Time part in 24-hour clock
`.mmm`	Optional milliseconds (1–3 digits)
`Z`	Optional UTC suffix
`+HH:MM` / `-HH:MM`	Optional UTC offset; offsets are subtracted to normalize to UTC

Examples of valid timestamp strings:

2024-03-15 10:30:00
2024-03-15T10:30:00
2024-03-15T10:30:00.123
2024-03-15T10:30:00Z
2024-03-15T10:30:00+09:00
2024-03-15T10:30:00-05:30

A bare date (2024-03-15) is also valid and is interpreted as midnight UTC.

Decimal Format

Type annotation: DECIMAL(precision, scale)

precision — total number of significant digits
scale — number of digits to the right of the decimal point

price:DECIMAL(10,2)
12345.67
99.00
-0.50

Both .-separated and integer-only inputs are accepted. The value is stored as a scaled integer (e.g., 12345.67 with scale 2 is stored as 1234567).

Null Values

An empty field is treated as NULL:

name:STRING|age:INT|score:DOUBLE
Alice|30|
Bob||7.5

Alice row: score is NULL
Bob row: age is NULL

Edge Files — Forward and Backward

TurboLynx stores two adjacency lists per edge type: one for forward traversal (start → end) and one for backward traversal (end → start). Both files must have the same property columns.

Forward file (:START_ID first):

:START_ID(Person)|:END_ID(Person)|since:INT
1|2|2020
1|3|2021

Backward file (:END_ID first, rows sorted by the first column):

:END_ID(Person)|:START_ID(Person)|since:INT
2|1|2020
3|1|2021

The backward file is the same data with the ID columns swapped and the rows re-sorted by the new first column (END_ID).

Convention: Name backward files with a .backward suffix, e.g., knows.csv → knows.csv.backward.

JSON

TurboLynx reads JSON files using the yyjson parser.

Top-level structure

A JSON file must be a single object with either a "vertices" key (for vertex files) or an "edges" key (for edge files), whose value is an array of objects.

Vertex file:

{
  "vertices": [
    { "id": 1, "firstName": "Alice", "lastName": "Smith", "age": 30 },
    { "id": 2, "firstName": "Bob",   "lastName": "Jones", "age": 25 }
  ]
}

Edge file:

{
  "edges": [
    { "src": 1, "dst": 2, "since": 2020 }
  ]
}

Supported JSON value types

JSON type	Mapped to
`boolean`	BOOLEAN
`integer`	INTEGER / BIGINT / UBIGINT
`number` (float)	FLOAT / DOUBLE
`string`	VARCHAR

Note: DECIMAL is not supported in the JSON path.

Parser flags

The JSON reader is permissive: it allows Inf/NaN values and trailing commas.

Directory Layout

Place all vertex and edge files in a flat directory. The bulkload tool scans the directory and infers file roles from the header annotations.

dataset/
├── person.csv               ← vertex file  (:ID annotation present)
├── comment.csv              ← vertex file
├── person_knows_person.csv  ← edge file    (:START_ID / :END_ID present)
├── person_knows_person.csv.backward  ← backward edge file
└── ...

File-type detection

Header contains	Interpreted as
`:ID(...)`	Vertex file
`:START_ID(...)` and `:END_ID(...)`	Edge file (forward)
`:END_ID(...)` appears first	Edge file (backward)

Running Import

./tools/turbolynx import \
    --workspace /path/to/db \
    --nodes Person  data/person.csv \
    --nodes Comment data/comment.csv \
    --relationships KNOWS data/person_knows_person.csv

Option	Description
`--workspace`	Directory where `store.db` and `catalog.bin` will be written
`--nodes <Label> <file>`	Vertex CSV file (repeatable)
`--relationships <Type> <file>`	Edge CSV file (repeatable)

After a successful load the schema is persisted to <workspace>/catalog.bin and the graph data to <workspace>/store.db. Subsequent import runs will append to the existing store.

Complete Example

Vertex file — `person.csv`

:ID(Person)|firstName:STRING|lastName:STRING|age:INT|score:DOUBLE|joined:DATE
1|Alice|Smith|30|9.5|2022-01-15
2|Bob|Jones|25||2023-06-01

Edge file — `knows.csv`

:START_ID(Person)|:END_ID(Person)|since:INT|weight:DECIMAL(5,2)
1|2|2020|1.50

Edge file — `knows.csv.backward`

:END_ID(Person)|:START_ID(Person)|since:INT|weight:DECIMAL(5,2)
2|1|2020|1.50