Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to be able to use parameterised insert queries to populate a table during migration #1292

Open
OracPrime opened this issue Oct 29, 2024 · 8 comments
Labels
s: pending triage Pending Triage

Comments

@OracPrime
Copy link

Description

It is currently quite messy to use parameterised inserts. If you naively use pgm.sql with $1 style parameters you get an error message related to regular expressions which is somewhat impenetrable if your not expecting mustache templating. If you use mustache templating, you are open to injection attacks. The only solution I can find is using pg-format for build the string (and I'm a bit worried if the result of that itself contains, for example $1.

Suggested solution

It would be nice to have access to an equivalent to pgm.sql() which doesn't do the mustaching. psm.sql_raw() for example. Either a new function, or an additional options argument on the existing one. However I prefer the separate function approach as it keeps the meaning simple and the option argument might be confused with an argument for the sql.

Alternative

pg-format is the only one I'm aware of.

Additional context

At minimum the regex error should be decorated with some explanation?

@OracPrime OracPrime added the s: pending triage Pending Triage label Oct 29, 2024
@Shinigami92
Copy link
Collaborator

Please note that we don't really need to be careful/sanitize parameters, because you should not use node-pg-migrate at runtime, but only for migration progresses.

That aside, what would you propose as an API?

And how does a current use-case look like right now (rough example).

@OracPrime
Copy link
Author

The current use case if for populating a static lookup table. I'd agree in theory you're only using your own data, so you should feel slightly safe, but in an open source environment an innocuous update of a data file could become an injection attack, so I want to be paranoid.

I think my proposed api is to have a function that looks exactly like sql but which doesn't mustache.

Current use case it this (previously insert with $1... $5)

	async up(pgm) {
		pgm.createTable("uk_stations", {
			station_id: { notNull: true, primaryKey: true, type: "bigserial" },
			station_name: { notNull: true, type: "text" },
			sixteen_character_name: { notNull: true, type: "text" },
			longitude: { notNull: true, type: "float8" },
			latitude: { notNull: true, type: "float8" },
			crs_code: { notNull: true, type: "char(3)" },
		});

		const filePath = path.join(__dirname, "../ukStations.csv");
		const fileContent = fs.readFileSync(filePath, "utf8");
		const records = parse(fileContent, {
			columns: true,
			skip_empty_lines: true,
		});

		// Prepare data for bulk insertion
		const values = records.map(
			({
				station_name,
				sixteen_character_name,
				latitude,
				longitude,
				crs_code,
			}) => [
				station_name,
				sixteen_character_name,
				latitude,
				longitude,
				crs_code,
			],
		);

		const query = format(
			`INSERT INTO uk_stations (station_name, sixteen_character_name, latitude, longitude, crs_code) VALUES %L`,
			values,
		);
		pgm.sql(query);
	},

@Shinigami92
Copy link
Collaborator

Just me thinking out loud: Not sure if we should provide a insert (and then update and delete) statement, because these are not really anymore migrations but populations 🤔
Currently I would suggest to use node-pg-migration as a migration tool to have your DB versioned and use other tools (potentially knex) for population after the migration was executed.

However, I even know from myself in company work that there is best practice and real world scenario 🤣 ...

So maybe we could add just the absolutely simplest form of such functions that in the end will more or less just to what you already do yourself with the format function.

pgm.insert('uk_station', {
  station_name: '...',
  // data for one station...
});

pgm.insertBulk('uk_station', [
  {
    station_name: '...',
    // data for one station...
  },
  // ...
]);

I have not thought that fully out yet, but I would be okay with it if you explore an initial draft PR on your own with that info and then I can review and see where it is going.
Sounds good?

@firelizzard18
Copy link

@Shinigami92 Some kinds of data are part of the schema, or in other words some tables are pointless without data. If I have an entity such as book and another entity to classify the first such as mediaType (e.g. paper, ebook, audiobook, etc), the class entity is effectively part of the schema IMO and should be populated by the migration. pgm.insert would work for my purposes.

@Shinigami92
Copy link
Collaborator

@Shinigami92 Some kinds of data are part of the schema, or in other words some tables are pointless without data. If I have an entity such as book and another entity to classify the first such as mediaType (e.g. paper, ebook, audiobook, etc), the class entity is effectively part of the schema IMO and should be populated by the migration. pgm.insert would work for my purposes.

Do you know https://www.postgresql.org/docs/current/datatype-enum.html ?

@firelizzard18
Copy link

Do you know https://www.postgresql.org/docs/current/datatype-enum.html ?

I was not aware of that. If I were designing a new schema I might use that, but I've inherited an existing schema and reworking it is not an option at this time. That being said, I often like to have metadata attached to my classes/enumerations.

@Shinigami92
Copy link
Collaborator

As I said, I'm not against such an implementation, because I know how real-life and company work is. However, some real-talk... I'm currently in a mental health breakdown (partly because of today...).
So I reduce currently as much as possible to work for others if I do not benefit from it myself.
But what we can do here, is to work in a community and someone of the persons in this issue can start a PR (implementation, documentation and tests), and I can review and help from there. I hope that sounds fair.

@firelizzard18
Copy link

That's certainly fair, I am not entitled to your free labor. If this becomes a blocker for me I will open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s: pending triage Pending Triage
Projects
None yet
Development

No branches or pull requests

3 participants