From 54767f6abbe5e9dfcf4fd35ed3529638572c1ca5 Mon Sep 17 00:00:00 2001
From: kindly Change LogSemantic Versioning.
truncate postgres
nan/inf ignored for xlsx as causing crash
timezone date types now accepted in postgres
Large xlsx cell values being truncated panic when multi threading.
nan/inf ignored for xlsx as causing crash
Large xlsx cell values being truncated causing panic if in unicode char.
Large xlsx cell values being truncated panic when multi threading.
Upgrade deps, low_memory option for API
Large xlsx cell values being truncated causing panic if in unicode char.
Upgrade deps, better build times due to latest duckdb
Upgrade deps, low_memory option for API
arrays_as_table
option added to convert all arrays to their own table.
Upgrade deps, better build times due to latest duckdb
Errors get raised for postgresql conversion.
arrays_as_table
option added to convert all arrays to their own table.
Parquet naming of headers where incorrect for dates.
Errors get raised for postgresql conversion.
Parquet naming of headers where incorrect for dates.
Allow multiple files while downloading from s3
Stop detecting floats where precision is too low.
CSV output to S3 broken in some cases.
Stop csv directory being made when using S3
JSON Input sources from STDIN, HTTP, S3 and allow all inputs to be GZIPed if have .gz
ending.
Command line now accepts multiple files from any source.
Better type guessing for database inserts.
no_link
option that removes _link
fields in the output.
Truncate cell that is larger than xlsx allows.
Allow more rows in xlsx in non threaded mode.
Web Assembly version of libflatterer. Available to use here https://lite.flatterer.dev/.
Upgrade to vue 3 and vite for web frontend.
Ignore blank lines in json lines files
Better errors when too many files are open
Support python 3.11
Error not writing larger XLSX files
Cors for web api
Local web interface for exploring flatterer features flatterer --web
.
evolve
option for sqlite and postgres. Can add data to existing tables and will alter tables if new fields are needed.
drop
option for sqlite
.
Postgres connection from environment variable
sql_script
option to export scripts that for sqlite and postgres to make output backward compatable with earlier versions.
pushdown
option. Copy data from top level objects down to child (one-to-many) tables. This is useful if the data has its own keys (such as id
fields) that you want to exist in the related tables. Also useful for denormalizing the data so querying on a common field, requires less joining.
postgres
option. Export to postgres database by supplying a connection string.
files
option, so multiple files can be supplied at once.
Threads option now can output xlsx
Threads option, so that can be run on all cores. Works best with ndjson input.
Parquet export option.
BREAKING: json-lines
option renamed to ndjson
New json-stream
option that works in same way as the old json-lines option and accepts concatonated json.
Datapackage output uses correct date type
Lists of strings are now escaped the same way as optional quoted CSVs
Clearer errors when error happens in rust. BREAKING CHANGE, if catching certain error types in python these may have changed.
datapackage output now has forign keys.
Python decimal converted to float not string.
SQLite export lower memory use
SQLite export has indexes and foreign key contraints.
main_table_name was number caused exception
list of JSON strings supplied to flatten
fixed.
datapackage.json named correctly.
flatten
python function now accepts iterator
Docs for flatten
More lenient if tmp directory can not be deleted.
Preview option in python CLI and library.
SQLite export option
Support top level object. All list of objects are streamed and top level object data saved in main table.
New yaglish parser for both json stream and arrays.
Library has schema_guess function to tell if data is a JSON Stream or has an array of object.
Empty objects do not make a line in output.
ctrlc support added.
Logging output improved.
Traceback not shown for CLI use.
Occurences where output folder not being deleted.
tables.csv
input in order to control tab names. Tables File option
Beginning to use logging.
Better handling of long excel sheet names names. See https://github.com/kindly/flatterer/issues/12
field_type
no longer required in fields.csv.
More human readable error messages.
Bad characters in XLSX stripped and raise warning.
Check limits on XLSX files and raise error if found.
Removed unwrap on channel send, to remove possible panic.
Table ordering of output in JSON input order. Making xlsx
and fields.csv
table order reflect the input data.
Lib has new FlatFiles::new_with_dafualts()
to make using the library less verbose.
Use insta for more tests.
Lib has preview option, meaning CSV output will optionally only show specified number of lines.
Paths to data in sqlite and postgres start at root of output.
Clippy for linting and insta for tests.
Do less work when just exporting Metadata.
Minor speedup due to not using format
so much.
Change to pypi metadata
Tests run in action
Regression in speed due to new error handling
New error handling using anyhow
, giving errors more context.
Schema option to supply JSONSchema, to make field order the same as schema.
Table prefix option to namespece exported tables.
Postgresql and sqlite scripts to load CSV data into databases.
Wheel builds for Windows and MacOS, automatically published using github actions.
Inline One to One option to mean that if an array only has one item in for all the data then treat it as sub-object.
$G_1dpN~0Eh-sfcEz$^1V5Ji%Gxp>tsB;cQ2v5Q
zcp}To{8)5(CTr{}o7Vis9E9Rc{>(CC=F3ldSX%Dg&3`BUYus2>bJ7o6Mm{#DDR@)!
zDSoJ30*NAGNgR~o(YypA_dk>ppy}S1mv*C5a*Q?>#nO5G#-MJFX@{=S6x1&lWD^p_
z)@k!o+N52LZM2b~*kL=}oDkVA7x5vg3W{n93vyS>01{P*L9g5H(B!S%?jQ4GilIl6
zTh+F9qJM2boBXL5e=sp&N@6R24drwDnAOh}9f4!Xt!mFiqAEVAD$5X473$;x35AGq
z%Uub Warning: this could mean you loose data For postgres and sqlite. Truncate the existing table if it exists. This is useful if you want to load the data into a databse with the schema pre-defined. Path to fields CSV file. The fields file can be used for:8;MNY=y1qX1qsxB47WuJ~umY2V#V+VgFZg8cuuBlB2)=`+5U34qbO@
z=T-Qy-CDmrJZ#>z?}47UvVU8-f_Z&YY;M^yy7}JBlYop2f4+NwaXgn@!Gi@@ZW1&U
z{HoeY2q0T;p2Ehq9+d5}qC&*Q@Rt9*DvR$;+PuQORb6s(H?Q4+Is1+5)8gXQK$A1S
zW1e$Lt1NBpEbY?ACOr;!0OByEk7~%0L)IrY^hFH8S!L)TGbjNWRew
Help (CLI Only)
+
Truncate Tables#
+CLI Usage#
+flatterer --postgres='postgres://user:pass@host/dbname' --sqlite-path=sqlite.db INPUT_FILE OUTPUT_DIRECTORY --truncate
+
Python Usage#
+import flatterer
+
+flatterer.flatten('inputfile.json', 'ouput_dir', postgres='postgres://user:pass@host/dbname', truncate=True)
+
Fields File#
Fields Filetable_name and
field_name
need to match up with the eventual structure of output. The easiest make sure of this is to edit the fields.csv
that is in an output directory.
You can generate just the fields.csv file by not outputting the CSV files.
By default if there are fields in the data that are not in the fields.csv
they will be added to the output after the defined fields. Use Only Fields to change this behaviour so that field not in the file will be excluded.
flatterer INPUT_FILE OUTPUT_DIRECTORY --fields fields.csv
import flatterer
flatterer.flatten('inputfile.jl', 'ouput_dir', fields='fields.csv')
@@ -742,14 +763,14 @@ Python Usage
Only Fields#
Only fields in the fields.csv file will be in the output.
-
-CLI Usage#
+
+CLI Usage#
flatterer INPUT_FILE OUTPUT_DIRECTORY --fields fields.csv --only-fields
-
-Python Usage#
+
+Python Usage#
import flatterer
flatterer.flatten('inputfile.jl', 'ouput_dir', fields='fields.csv', only_fields=True)
@@ -791,14 +812,14 @@ Tables Filetables_name has to be the name that would be output by flatterer
. To make sure that these names are correct it is best to use the tables.csv
that is always in the output directory as an basis for modifying the output.
By default if there are tables in the data that are not in the tabless.csv
they will be added to the output after the defined tables. Use Only Tables to change this behaviour so that only tables in this file will be output.
-
-CLI Usage#
+
+CLI Usage#
flatterer INPUT_FILE OUTPUT_DIRECTORY --tables tables.csv
-
-Python Usage#
+
+Python Usage#
import flatterer
flatterer.flatten('inputfile.jl', 'ouput_dir', tables='tables.csv')
@@ -809,14 +830,14 @@ Python Usage
Only Tables#
Only tables in the tables.csv file will be in the output.
-
-CLI Usage#
+
+CLI Usage#
flatterer INPUT_FILE OUTPUT_DIRECTORY --tables tables.csv --only-tables
-
-Python Usage#
+
+Python Usage#
import flatterer
flatterer.flatten('inputfile.jl', 'ouput_dir', tables='tables.csv', only_tables=True)
@@ -828,14 +849,14 @@ Python Usage#
When a key has an array of objects as its value, but that array only ever has single items in it, then treat it these single item as if they are a sub-object (not sub array).
Without this set any array of objects will be treated like a one-to-many relationship and therefore have a new table associated with it. With this set and if all arrays under a particular key only have one item in it, the child table will not be created and the values will appear in the parent table.
-
-CLI Usage#
+
+CLI Usage#
flatterer INPUT_FILE OUTPUT_DIRECTORY --inline-one-to-one
-
-Python Usage#
+
+Python Usage#
import flatterer
flatterer.flatten('inputfile.json', 'ouput_dir', inline_one_to_one=True)
@@ -847,14 +868,14 @@ Python Usage
Arrays as Table