When loading a file two additional metadata
fields are added to each record: _filename
and _index
(row number in the file). These fields are automatically included as they are very useful when trying to understand where certain data came from when consuming the data downstream.
The computational cost of adding the _index
column in a distributed execution engine like Spark means that sometimes it is not worth the time/expense of precisely resolving the row number. By setting contiguousIndex
equal to false
Spark will include a different field _monotonically_increasing_id
which is a non-sequential/non-contiguous identifier from which _index
can be derived later but will not incur the same cost penalty of resolving _index
.
Default: true.