Reading and Understanding exported sessions data
Auto-export works on a user-provided AWS S3 bucket where files are periodically uploaded. Users can configure the export format and export frequency in the Web UI. See Exporting Data.
Sessions data can also be exported using the API - exporting raw data.
To provide basic structure the uploaded files are prefixed with a date, for example "20190508/export-4842631999258612-37e812c0-d66e-48a2-8812-27ee12d58b58-output-0". The prefix date reflects when the export job was initiated on Leanplum side (in PST) and does not relate in any way to the actual content of the files (the files may contain sessions from different dates that became available in Leanplum since the last successful export job).
Once all files for a particular job are uploaded, Leanplum finally uploads a "manifest" file that signifies that the upload is complete and the files are ready to be consumed. It has the same name as the data files, but instead of a shard id it ends with "manifest" e.g. "20190508/export-4842631999258612-37e812c0-d66e-48a2-8812-27ee12d58b58-output-manifest". This is a JSON file that contains a list of all uploaded files. Uploaded files with no corresponding manifest file should not be processed, as they are a partial upload that has not yet been completed. If a pack of files fails to upload completely after a few retries, the manifest will never arrive and the partial data files should be discarded. The next successful scheduled export job will include all sessions from previously failed exports.
If the export file format is CSV the file names contain the type of data included in them: sessions, experiments, states, events, eventparameters, userattributes. E.g. "20190508/export-4842631999258612-37e812c0-d66e-48a2-8812-27ee12d58b58-outputsessions-0". Each data type has a separate manifest file, e.g. "20190508/export-4842631999258612-37e812c0-d66e-48a2-8812-27ee12d58b58-outputsessions-manifest".
To process each export job as it comes, wait for the manifest file(s) to appear and then process the files that are referred inside the manifest files.
Timestamps
All times inside the exported files are in UTC. Note that the Analytics is in PST, so you need to convert the timestamps when comparing the data between the exported files and dashboard Analytics.
Filtering sessions
The property 'isSession' shows if the record is legitimate user interaction, rather than an offline event like 'MessageX Sent'. The calculation of all session metrics (DAU, First-time users, Retention, Total sessions, etc.) should include filtering for 'isSession = true'.
When counting message events, purchases or custom events/states, isSession should be ignored.
Events with no parent state
It is possible that an event is tracked while no state is being active. Such events are exported in JSON format by wrapping them in a 'dummy' state with no name and a random ID.
In CSV format, the event is assigned a random 'StateId' and there will be no corresponding row in the 'states' table.
Schema
Updated about 3 years ago