Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convince me to use better-sqlite3 #262

Open
Bouncue opened this issue May 8, 2019 · 51 comments
Open

Convince me to use better-sqlite3 #262

Bouncue opened this issue May 8, 2019 · 51 comments

Comments

@Bouncue
Copy link

Bouncue commented May 8, 2019

What makes me hesitant to use better-sqlite3 is because node-sqlite3 more popular, if this is really
"better" why is the other party more popular?

@JoshuaWise
Copy link
Member

JoshuaWise commented May 8, 2019

There's a restaurant down the street from me called Yaso Tangbao. They have delicious beef soup and dumplings with a fantastic ginger vinegar sauce. Everyone that I've brought there has said it was some of the best food they've ever had. Right next to it, there's a McDonalds. Millions of people go to McDonalds every day, but Yaso Tangbao attracts a much more modest following. McDonalds is certainly more popular. Does this mean it's better? The obvious answer is "no", it just has greater brand recognition.

Brand recognition, whether you like it or not, controls all of our lives, even in the open source marketplace. As another example, the popular test framework mocha was actually unmaintained for years. It had so many bugs at one point that the number of issues in its GitHub repo reached quadruple digits. Despite this, it remained the most popular test framework in Node.js. Why? Because it has brand name recognition. People trust mocha, regardless of objective truth about its quality. (now it is maintained again, because of popular demand).

Even the mysql driver for Node.js is more popular than mysql2, despite them sharing the same authors. The authors have been very vocal about mysql2 being the superior successor to mysql, but many people still don't even know mysql2 exists simply because mysql came first. Newcomers will still choose mysql because it appears first when searched at npmjs.com.

In open source, being first is more important than being better. I can't think of a single example of a Node.js package that is more popular than the one it's trying to replace, even if it's clearly superior.

In any case, here are the reasons to choose better-sqlite3:

1. It's bug-free

Despite node-sqlite3 having 3x as many stars, it has 10x as many issues in GitHub. If you search through them, you'll see countless issues about race conditions, data corruption, crashing, unexpected behavior, and more. Alternatively, if you search through the better-sqlite3 issues, you'll exclusively find installation issues and suggestions for enhancement. It's safe to assume that if you can install better-sqlite3, it will work as expected.

2. It's faster

See the benchmarks results, where the performance of better-sqlite3 and node-sqlite3 are compared.

3. It's simpler to use

Let's say we'd like to execute an atomic transaction that performs 2 steps:

  1. update some data
  2. read some data

The proper implementation in node-sqlite3 looks like this:

const { promisify } = require('util');
const { Database }  = require('node-sqlite3');

// First, we have to promisify everything, since node-sqlite3 is callback-based
const open = promisify((path, cb) => new Database(path, function (err) { cb(err, this); }));
const get = promisify(Database.prototype.get);
const run = promisify(Database.prototype.run);
const close = promisify(Database.prototype.close);

// To safely handle a transaction, we must open a separate database connection per transaction
// See: https://github.com/mapbox/node-sqlite3/issues/304#issuecomment-45280758
async function myTransaction() {
  const db = await open('data.db');
  let result;
  try {
    await run.call(db, 'BEGIN');
    try {
      await run.call(db, 'UPDATE ...');
      result = await get.call(db, 'SELECT ...');
      await run.call(db, 'COMMIT');
    } catch (err) {
      try { await run.call(db, 'ROLLBACK'); }
      catch (_) { /* manually ignore cases where the sqlite3 automatically rolled back the transaction */ }
      throw err;
    }
  } finally {
    await close.call(db);
  }
  return result;
}

The proper implementation in better-sqlite3 looks like this:

const db  = require('better-sqlite3')('data.db');

const myTransaction = db.transaction(() => {
  db.prepare('UPDATE ...').run();
  return db.prepare('SELECT ...').get();
});

Not to mention, transactions in better-sqlite3 can automatically be nested. Can you imagine trying to write a nested transaction in node-sqlite3? Nightmare.

4. Features

In better-sqlite3, you can register custom functions and aggregate functions written in JavaScript, which you can run from within SQL queries.

In better-sqlite3, you can iterate through the cursor of a result set, and then stop whenever you want (you don't have to load the entire result set into memory).

In better-sqlite3, you can conveniently receive query results in many different formats (here, here, and here).

In better-sqlite3, you can safely work with SQLite's 64-bit integers, without losing precision due to JavaScript's number format.

None of the features listed above are possible in node-sqlite3.

Conclusion

better-sqlite3 wasn't first to the marketplace, so it doesn't have the brand recognition or years of legacy usage that node-sqlite3 has. But really, use whatever you want.

@alin1771
Copy link

alin1771 commented Nov 5, 2019

I did some comparison to node-sqlite3 by myself
first of all, synchronous calls should be avoided in a web server
just because fs.writeFile is slower than fs.writeFileSync it doesn't mean you should use fs.writeFileSync (there is a use case for each)
i've ran 1.000 inserts and updates and there was no difference in execution time
(maybe node-sqlite3 did some improvements meanwhile)

for 10.000 selects better-sqlite3 won but most of the time you are not using sqlite for selects (I guess)

the problem I had with node-sqlite3 is that it uses too much memory and doesn't release enough; I don't think is a memory leak but it bothers me and this is the reason I was looking into better-sqlite3 to solve my problem;

the sync only architecture is a huge trade-off for me

JoshuaWise should implement async or he should contribute to node-sqlite3 to share his improvements

@JoshuaWise
Copy link
Member

JoshuaWise commented Nov 5, 2019

@alin1771 With fs.writeFile, you can write multiple file simultaneously, where fs.writeFileSync can only write one file at a time. The fact you're missing is that both node-sqlite3 and better-sqlite3 can only execute one SQL statement at a time. Queries in SQLite3 are serialized, regardless of whether you wrap them in asynchronous primitives.

You should test how fast your queries are (e.g., 0.1 milliseconds), and decide how much server latency is an unacceptable amount for your users (let's say, 200 milliseconds). In this example, you'd be able to handle 2000 concurrent requests before your server latency becomes unacceptable. You could switch to node-sqlite3 to unblock your web server while you run queries, but that doesn't remove the bottleneck, it only shifts where the bottleneck occurs, because even node-sqlite3 is only able to do one query at a time.

Because of SQLite3's serialized nature, it's better to optimize for speed, not concurrency (which is limited to 1).

@alin1771 Without seeing your benchmark code, I cannot asses why you're not getting the performance observed by the benchmark in this repo, which is publicly reviewable by all. I'd start by making sure you turn on WAL mode, which is a necessity for any web server using SQLite3.

db.pragma('journal_mode = WAL');

@alin1771
Copy link

alin1771 commented Nov 5, 2019

with fs.writeFile not only you can write multiple files but you are also not blocking the main thread and this is the most important;

my tests looks something like this:
for (i = 0; i < 1000; i++) { db.prepare('UPDATE [test] SET value=? WHERE id=?').run(Math.random(), i) }
vs
db.serialize(() => {for (i = 0; i < 1000; i++) { db.run('UPDATE [test] SET value=? WHERE id=?', [Math.random(), i]) }})

for me, both are fast enough I could say but better-sqlite3 blocks the main thread;
indeed WAL increase the speed by a lot but this rely on my SSD availability at that time I guess, I dont know the magic behind it.
I use multiple processes for the same database and to call a checkpoint after each update is not ideal, setInterval is not a proper solution; checkpoint method is not async either.

I like your code, is very clean and light.
I will probably do this sync to async migration by myself and I hope you will see the benefit of it at some point and we can merge

@JoshuaWise
Copy link
Member

JoshuaWise commented Nov 5, 2019

@alin1771 If keeping the main thread unblocked is very important to you, you may be interested in this PR which enables better-sqlite3 to work in Worker Threads. This would probably be the best of both worlds for many people. I still need to review the changes, but the PR may be accepted in the near future.

@JoshuaWise
Copy link
Member

JoshuaWise commented Nov 5, 2019

One issue I see with your benchmark is that in better-sqlite3, db.prepare() only needs to be invoked once for a particular SQL string. The resulting Statement object can be reused over and over, which significantly improves performance. This isn't possible with node-sqlite3.

So a proper usage would actually be:

const stmt = db.prepare('UPDATE [test] SET value=? WHERE id=?');
for (i = 0; i < 1000; i++) stmt.run(Math.random(), i);

@alin1771
Copy link

alin1771 commented Nov 5, 2019

you can do the same with node-sqlite3, they support prepare as well;
I understand these improvements but they are not a real use-case and there is no point to improve a benchmark code;
most of the time these update queries come from different api calls, from different users and I cannot group them;

I will take a look into that PR but from what I know worker_threads are not as performant as "native" async; but sounds good anyway

thank you @JoshuaWise

@JoshuaWise
Copy link
Member

JoshuaWise commented Nov 5, 2019

most of the time these update queries come from different api calls, from different users and I cannot group them;

You can still reuse prepared statements by keeping a global object of all prepared statements used throughout the application. If you don't like globals, you could move it into its own module.

you can do the same with node-sqlite3, they support prepare as well;

Oops, I was wrong, it looks like you can reuse prepared statements in node-sqlite3 now. Thanks for correcting me.

Something else you might be interested in is that node-sqlite3 doesn't support transactions unless you use a different database connection for every transaction. You would think db.serialize() would be enough, but it doesn't allow you to handle errors within transactions. See this thread. That's one of the main reasons I built better-sqlite3. Opening a new database connection for each transaction was far too expensive for me.

@ronburk
Copy link

ronburk commented Jan 22, 2020

Not to argue, but hoping to better grasp the design/intent:

Because of SQLite3's serialized nature, it's better to optimize for speed, not concurrency

Seems like this philosophy pays off when most HTTP requests need to hit the database and the database I/O is the lion's share of the time required to process each request. Seems like this philosophy would not pay off if, say, half your HTTP requests need to do no disk I/O at all. In this latter case, presumably, all the time spent by sqlite3 waiting on the disk could have been spent moving along the HTTP requests that do not access the disk.

So probably WAL mode tilts the equation in favor of ignoring concurrency (if there are enough updates in the mix) and SSD instead of spinning disk likewise raises the odds that concurrency will not help enough to care. Both factors that reduce the time wasted making V8 wait for disk I/O.

Opening a new database connection for each transaction was far too expensive for me.

I think here you're saying you can reuse an sqlite connection instead. I think I understand that sqlite3 does not separate transaction context from connection. I mean, I don't get any kind of separate transaction object that would allow me to be conducting two separate transactions at the same time over the same connection. So if someone did not follow your advice to avoid letting a transaction spread across one event loop tick, they would either need to serialize access to one sqlite3 connection, or else open a new connection to handle the next transaction that comes along before the first connection is available to be reused.

Queries in SQLite3 are serialized,

But the at-most-one writer does not block readers in WAL mode, right? In which case, it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us.

Did I get any of that right?

@JoshuaWise
Copy link
Member

Seems like this philosophy pays off when most HTTP requests need to hit the database and the database I/O is the lion's share of the time required to process each request. Seems like this philosophy would not pay off if, say, half your HTTP requests need to do no disk I/O at all. In this latter case, presumably, all the time spent by sqlite3 waiting on the disk could have been spent moving along the HTTP requests that do not access the disk.

That's correct.

So probably WAL mode tilts the equation in favor of ignoring concurrency (if there are enough updates in the mix) and SSD instead of spinning disk likewise raises the odds that concurrency will not help enough to care. Both factors that reduce the time wasted making V8 wait for disk I/O.

Again, correct.

I think here you're saying you can reuse an sqlite connection instead. I think I understand that sqlite3 does not separate transaction context from connection. I mean, I don't get any kind of separate transaction object that would allow me to be conducting two separate transactions at the same time over the same connection. So if someone did not follow your advice to avoid letting a transaction spread across one event loop tick, they would either need to serialize access to one sqlite3 connection, or else open a new connection to handle the next transaction that comes along before the first connection is available to be reused.

This is correct. However, it's important to note that while node-sqlite3 (the popular async library) does provide a way to serialize access to a connection, it's not sufficient for writing transactions because it doesn't provide a way to catch errors and rollback the transaction. That's essentially the topic of this thread.

But the at-most-one writer does not block readers in WAL mode, right? In which case, it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us.

I've tried running benchmarks where async connections (node-sqlite3) were trying to perform read requests concurrently with an overlapping writer connection, but I couldn't get it to perform as well as this library. Perhaps you could try to replicate the UW research by implementing some kind of priority queue, where each prepared statement was ranked with a performance cost. Maybe it'll work, maybe it won't. Either way, no async library I'm aware of (e.g., node-sqlite3) is implemented like that currently.

@eslachance
Copy link

eslachance commented Mar 11, 2020

I want to chime in to this conversation, because better-sqlite3 has done a lot to make the lives of many beginner nodejs developers simpler, even if they might not know at face value that it does. I'm the developer of a small-ish database wrapper called Enmap, which has evolved a lot since I started writing it years ago. It started with level-db, then I moved to having multiple plugins for the backend (sqlite being only one of them).

For the last year, however, I've locked myself into using better-sqlite3 because Enmap's main focus is to be simple to use for beginners, and I've found that having the ability to make synchronous requests to the database has helped tremendously in this endeavour. Every other database connector required the uses of promises, or the loading of the entire data set in memory, in order for me to offer the features that Enmap offers. For example, I could not synchronously auto-fetch data when it's required if I was using the sqlite (or, god forbids, the sqlite3 callback hell) modules.

better-sqlite3 being the only sync connector has helped Enmap grow to the point where I'm expecting to reach a million total downloads by the end of the summer. And when I'm not using my own module, I'll automatically default to better-sqlite3 for rapid prototyping because it's faster, simpler, and requires less setup.

To counter the shameless self-promotion, I'll add that as far as I'm concerned, the fact that my main "competitor" quick.db has switched to better-sqlite3 in order to remove promises was probably due in part to my own switch, but it's undeniably popular, having itself reached a million downloads this very week. Both of us have a great deal of gratitude owed to Joshua for our popularity and continued success.

@x3cion
Copy link

x3cion commented Apr 16, 2020

2 Cents, since I stumbled across this: I'm using better-sqlite3 professionally since 3 years and never been disappointed. The blocking was never a problem to us, but that's because SQLite is very fast (if handled properly; PRAGMA page_size / temp_store / journal_mode / synchronous anyone? 😀 ) and the task at hand was focused build and reading those gb's of sqlite files.

I can see why it's still common to use sync calls for simplicity and I didn't dig deep enough to get an idea of the current involved overhead for async writes. I can also see why for other tasks, blocking the main thread can be an issue. Node.js already handles async writes in its own thread and the best CPU is the one with 100% load all the time.

The performance overhead regarding async dropped alot since the initial choice for sync, so with async/await, which increases simplicity regarding asynchroniousity and the performance increases for promises, there might be a chance to go this async route nowadays without losing too much performance or even adding some. After all, not blocking the main thread might parallel the preparation of upcoming queries while writing the current one. Have there been PR's for direct asynchroniousity or only for worker threads?

Either way, better-sqlite3 never failed us. We have some complex things going on and this package has never been an issue, but a huge benefit regarding performance.

@patarapolw
Copy link

I am looking back into better-sqlite3 because,

  • Simpler for transactions
  • Simpler sqlite3.each / betterSqlite3.iterate

However, I still have problem with Electron, where node-sqlite3 is probably guaranteed to succeed. #126

@TurtleIdiot
Copy link

better-sqlite3 is way better for me. I am the only writer/maitainer of a small discord bot which has databases.
I remember trying to implement node-sqlie3 as my database library and it was extremely difficult. Not only did I have a ton of nested callbacks, the library was confusing for a js beginner like me.
better-sqlite3 is simply amazing becuase I can write code confidently and I know that it won't fail

@ianmcorvidae
Copy link

ianmcorvidae commented Jul 27, 2020

I'd like to provide an esoteric, but possibly relevant to some, reason to use better-sqlite3: when I tried to use it and sqlite3 on a FreeBSD system, better-sqlite3 successfully set itself up and compiled the C-language sqlite3 stuff for me, where sqlite3 failed to compile and thus couldn't install. YMMV, of course!

@dschinkel
Copy link

dschinkel commented Dec 6, 2020

can we do sqlite migration calls with this?

@Xananax
Copy link

Xananax commented Dec 16, 2020

I did a migration script (stolen inspired by sqlite's one):

import fs from 'fs'
import { Database } from 'better-sqlite3'
import { join } from 'path'

type Migration = ReturnType<typeof parseMigrationFile>
type MigrateParams = Partial<ReturnType<typeof getDefaultParams>>

/**
 * Returns a parsed migration file if the file corresponds to the filename
 * schema of xxx.yyyyy.sql, where `x` is a number, and `y` anything.
 * @param root root directory
 * @param filename potential sql file
 */
const parseMigrationFile = (root: string, filename: string) => {
  const [, id, name] = filename.match(/^(\d+).(.*?)\.sql$/) || []
  if (!name) {
    return null
  }
  const [up, down] = fs
    .readFileSync(join(root, filename), 'utf8')
    .split(/^--\s+?down\b/im)
    .map((part) => part.replace(/^--.*?$/gm, '').trim())
  return { id: Number(id), name, up, down }
}

/**
 * Loops through files in a directory and extracts the migration SQL files
 * @param migrations_directory a directory containing SQL files
 */
const readMigrations = (migrations_directory: string) =>
  fs
    .readdirSync(migrations_directory)
    .reduce((acc, file) => {
      const props = parseMigrationFile(migrations_directory, file)
      return props ? [...acc, props] : acc
    }, [] as ReturnType<typeof parseMigrationFile>[])
    .sort((a, b) => a.id - b.id)

/**
 * Create a database table for migrations meta data if it doesn't exist
 * @param db the database connection
 * @param table the table name
 */
const createMigrationsTable = (db: Database, table: string) => {
  db.transaction(() => {
    db.prepare(
      `CREATE TABLE IF NOT EXISTS "${table}" (
        id   INTEGER PRIMARY KEY,
        name TEXT    NOT NULL,
        up   TEXT    NOT NULL,
        down TEXT    NOT NULL
      )`
    ).run()
  })()
}

/**
 * Reads the migrations metatable to see if it is valid.
 * Undoes migrations that exist only in the database but not in files.
 * Starts with the latest migration and climbs up. Assumes files are
 * sequential, so if it only removes superfluous migrations, leaving the
 * rest as they are
 * @param db database connection
 * @param table table name
 * @param is_valid a function that determines if a migration is valid. If it is,
 *                 then the database is now in sync, and the function ends
 */
const syncDatabaseMigrations = (
  db: Database,
  table: string,
  is_valid: (id: number) => boolean
) => {
  // Get the list of already applied migrations
  let dbMigrations: Migration[] = db
    .prepare(`SELECT id, name, up, down FROM "${table}" ORDER BY id ASC`)
    .all()

  const remove_dbMigration = (id: number) =>
    (dbMigrations = dbMigrations.filter((migration) => migration.id !== id))

  const remove_migration = db.prepare(`DELETE FROM "${table}" WHERE id = ?`)
  const reversedDbMigrations = dbMigrations.slice().reverse()
  for (const { id, down } of reversedDbMigrations) {
    if (is_valid(id)) {
      break
    }
    db.transaction(() => {
      db.exec(down)
      remove_migration.run(id)
      remove_dbMigration(id)
    })()
  }

  const lastMigrationId = dbMigrations.length
    ? dbMigrations[dbMigrations.length - 1].id
    : 0
  return lastMigrationId
}

/**
 * Returns default parameters for the migrate function
 */
const getDefaultParams = () => ({
  table: '_meta_migrations',
  migrationsDirectory: 'migrations',
  reapplyLast: true
})

/**
 *
 * @param db a database connection
 * @param config optional configuration to specify migrations directory and/or
 *               metadata migration table name
 */
export const migrate = (db: Database, config: MigrateParams = {}) => {
  const { table, migrationsDirectory, reapplyLast } = {
    ...getDefaultParams(),
    ...config
  }

  const migrations = readMigrations(migrationsDirectory)
  const lastFileMigrationId = migrations[migrations.length - 1]?.id

  const migration_exists = (id: number) => {
    return (
      migrations.some((m) => m.id === id) &&
      (!reapplyLast || id !== lastFileMigrationId)
    )
  }

  createMigrationsTable(db, table)
  const lastDbMigrationId = syncDatabaseMigrations(db, table, migration_exists)

  // Apply pending migrations
  const add_migration = db.prepare(
    `INSERT INTO "${table}" (id, name, up, down) VALUES (?, ?, ?, ?)`
  )
  for (const { id, name, up, down } of migrations) {
    if (id > lastDbMigrationId) {
      db.transaction(() => {
        db.exec(up)
        add_migration.run(id, name, up, down)
      })()
    }
  }
}

@zzjlamb
Copy link

zzjlamb commented Mar 15, 2021

Can better-sqlite3 open, read and write to a .db file created by sqlite3?

@eslachance
Copy link

Can better-sqlite3 open, read and write to a .db file created by sqlite3?

It should absolutely be able to do that, yes, since sqlite3 is a standard file format and better-sqlite3 is just the "client" to that file :)

@yosiasz
Copy link

yosiasz commented May 18, 2021

when will it offer encryption option?

Nachtalb added a commit to Nachtalb/thelounge that referenced this issue May 22, 2021
- Which provides transaction / rollback wrapper (I heared talk on IRC that we do everyting transactional... we didn't do anything transactional)
- Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about missmatchs??!)
- much much faster
- provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯
- less bugs (according to better-sqlite3 team)

Also read:
https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3
WiseLibs/better-sqlite3#262 (comment)
Nachtalb added a commit to Nachtalb/thelounge that referenced this issue May 22, 2021
- Which provides transaction / rollback wrapper (I heared talk on IRC that we do everyting transactional... we didn't do anything transactional)
- Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about missmatchs??!)
- much much faster
- provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯
- less bugs (according to better-sqlite3 team)

Also read:
https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3
WiseLibs/better-sqlite3#262 (comment)
Nachtalb added a commit to Nachtalb/thelounge that referenced this issue May 22, 2021
- Which provides transaction/rollback wrapper (I heard talk on IRC that we do everything transactional... we didn't do anything transactional)
- Is synchronous, so we can test the state of the DB beforehand (instead of after we already told everyone it's ok, like wtf why did we even check for schema version when we don't do anything about mismatches??!)
- much much faster
- provides more functionality we might use in the future, you never know ¯\_(ツ)_/¯
- fewer bugs (according to better-sqlite3 team)

Also read:
https://github.com/JoshuaWise/better-sqlite3#why-should-i-use-this-instead-of-node-sqlite3
WiseLibs/better-sqlite3#262 (comment)
@mindhells
Copy link

mindhells commented Aug 23, 2022

I did a quick test for my use case comparing both.
My use case is: server app connecting to sqlite in readonly mode (I have a replication system using litestream and many reading replicas but that's not relevant for this test).
For the test I send a simple query to a single table.
I use k6 for the load testing of the server app.

node-sqlite3:
Captura de pantalla 2022-08-23 a las 18 08 13

better-sqlite:
Captura de pantalla 2022-08-23 a las 18 08 47

Not sure if it's relevant but I have the query (statement.all()) function call wrapped with a promise.

@renanoliveira0
Copy link

Hello @JoshuaWise,

When you made the initial design decision that you decided to do it synchronously rather than asynchronously, what were your reasons?

I particularly can't understand why to do something synchronous if at the time you started doing the project (correct me if I'm wrong) we already had ES6 with async await.

The only sensible and rational reason that seems to exist, in my opinion, is to avoid callback problems, right?

Is there a reason for this design decision?

I saw some comments here in the thread above that if blocking the main thread is a problem, we have this or that option. So, how blocking the main thread might not be a problem for someone in an application that is continually receiving events and might start processing a new event while waiting for an intermediate OI from a previous event.
How can that be not a problem (for a real web server application or any application that continuously receives events in an interval less than the total processing of the event(almost all apps, right?))?

@renanoliveira0
Copy link

I imagine that all the benchmarks don't take into account the time lost by blocking the main thread, right?

@JoshuaWise
Copy link
Member

JoshuaWise commented Dec 19, 2022

Hello @JoshuaWise,

When you made the initial design decision that you decided to do it synchronously rather than asynchronously, what were your reasons?

I particularly can't understand why to do something synchronous if at the time you started doing the project (correct me if I'm wrong) we already had ES6 with async await.

The only sensible and rational reason that seems to exist, in my opinion, is to avoid callback problems, right?

Is there a reason for this design decision?

I saw some comments here in the thread above that if blocking the main thread is a problem, we have this or that option. So, how blocking the main thread might not be a problem for someone in an application that is continually receiving events and might start processing a new event while waiting for an intermediate OI from a previous event. How can that be not a problem (for a real web server application or any application that continuously receives events in an interval less than the total processing of the event(almost all apps, right?))?

Already discussed here: #181 (comment)

Summary: SQLite can only do one transaction at a time, so doing it in another thread doesn't enable parallelism or anything like that, which you would expect from a normal file API. So doing that thread management is just wasted and it actually slows you down. Also, keeping a transaction open across async (await) operations is really really bad in SQLite, as it locks the database for the entire duration (maybe around 100ms for an HTTP request), where you can do the transaction synchronously in microseconds. That means much higher throughput. If you care about thread blocking, you can use a worker thread, and it will be much better than using an async API.

I imagine that all the benchmarks don't take into account the time lost by blocking the main thread, right?

The benchmarks measure throughput of SQLite operations. The concept of "blocking" being bad is only relevant when the same thread is doing many different things, which it isn't in a benchmark. Anyways, you can use SQLite in a worker thread if blocking becomes an issue. However, in many cases it won't even be an issue because, with proper indexes, most transactions can complete in microseconds (except for full table scans).

@ghost
Copy link

ghost commented Jan 26, 2023

Asynchronous database I/O is hell even for the biggest companies like Facebook.

I discovered a bug by coincidence when I was signing up on Facebook. I had a broken mouse that double clicks instead of clicking in a very small interval. When I clicked sign up, double Facebook accounts with the same email were created. Facebook doesn't allow to create a duplicate account with same email but that's what happened.

If you want asynchronous database I/O fine go ahead but don't be surprised by the amount of bugs that could occur for you.

@houd1ni
Copy link

houd1ni commented Mar 16, 2023

Wonderful explanation, @JoshuaWise !! I'd forward it to the many when they wonder 🚀
Why the issue isn't pinned yet ?

@mceachen mceachen pinned this issue Mar 16, 2023
@giacomorebonato
Copy link

Sharing some thoughts, in case it's useful or also in case I am not understanding something :)
I am evaluating this library for my personal purposes.

However, in many cases it won't even be an issue because, with proper indexes, most transactions can complete in microseconds (except for full table scans).

Mistakes just happen... if for some reason a query takes a few seconds to complete, it means that the problem has to get caught first and then the query fixed and moved to a worker.
In the meantime the application does nothing instead of doing other work, for example handle some more incoming requests in case of an API backend.

I am evaluating drizzle-orm which seems a nice option.
In the examples it shows how to connect to different types of Sqlite (like Turso or Cloudflare workers).

I see that also in those cases "async/await" is present but better-sqlite can't be used.

@ghost
Copy link

ghost commented Apr 12, 2023

Prisma uses node-sqlite3 as their SQLite engine and people are complaining about it already.

prisma/prisma#2825
prisma/prisma#12785 (Prisma is 100x slower than better-sqlite)

@joseferben
Copy link

Thank you @JoshuaWise for this package.

Apart from what was said already regarding sync vs. async and performance, the API design is just great. It really feels like an extension of the SQLite design/philosophy into the JS/Node world. The documentation is also very good.

@paleo
Copy link

paleo commented Jan 21, 2024

Summary: SQLite can only do one transaction at a time, so doing it in another thread doesn't enable parallelism or anything like that

You could have one thread per SQLite database, it would make sense. It would enable parallelism with other I/O, and with JS processing on the main thread.

It is really a bad practice to implement an I/O on the main thread.

@WassimYazbekk
Copy link

i built a pos system with it and i was lazy so i didn't have pagination and said i will add it in an update. that was 5 month ago and i may not need to add pagination for two more years after finding that the slowest pc can fetch 10M orders in 500ms

@renanoliveira0
Copy link

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

@renanoliveira0
Copy link

For me, the answer to the question in this thread is, no, I'm not convinced to use better-sqlite3. For my case, the fact that it's blocking inevitably makes it absurdly less performant, and to make it performant again, I have to waste time messing with things I shouldn't need to. So, my recommendation is, use a non-blocking library; it'll certainly perform better if you're not willing to spend time dealing with these details to fine-tune things.

@renanoliveira0
Copy link

renanoliveira0 commented Apr 26, 2024

That's the architecture choice of JavaScript, non-blocking. If you need to do something blocking, in my view, unless you have an absurdly clear and robust reason, you're probably doing something wrong.

If you're worried about race conditions, just use await and you're good to go (for same context, for other contexts is not a race conditions, is only other access to other parts of DB); none of the arguments above make any sense.

@WassimYazbekk
Copy link

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

I used it in electron js for a simple pos system so blocking the thread for 100ms isn't a problem considering the performance of my competition. There is no right tool for every case, i was just saying that it's very fast.

@JoshuaWise
Copy link
Member

JoshuaWise commented Apr 26, 2024

I'm sorry, but this just doesn't make any sense to me.

I have operations that take several seconds due to the volume of some tables. This is running on the client side. With everything being synchronous, the process is blocked for that time while it has a million other things to do, including responding to user events, since this data modification is not crucial for the application startup.

Sure, there are a million ways to solve this, but what absolutely makes no sense to me is why this wasn't solved exactly the way JS does it, without blocking the process and delivering everything out of the box? Why leave the overhead of having to solve optimization details to the developer while the runtime delivers the ready solution? Excuse me, but it makes absolutely no sense to me.

If we were to spin up workers or anything that "emulates" threads in Node, I'd rather use Python, which has much better quality libs. What's the point of using JS if we need to program in a blocking manner?

I don't understand the reasoning behind this choice.

This would only deliver more performance, more efficiency, more ease. For those who don't want concurrency issues, the language delivers everything ready to avoid race conditions.

Anyway, just venting.

There are plenty of plug-and-play Worker Thread libraries if you need to run queries outside the main thread. A solution like that will always perform faster than using the non-blocking node-sqlite3 library. Not to mention, if your queries are well-indexed and they run on SSDs, small queries should run in microseconds, so they're actually faster to do in the main thread because the overhead of multi-threading outweighs the query itself.

Your only argument seems to be "Node.js does it this way and therefore it's the best and only way", which is simply not true. If your only use-case is to run very slow read-only queries at a low throughput, then yeah, the node-sqlite3 library will probably be better for you. For most other use-cases, better-sqlite3 provides significant advantages. And even in that case, if you're willing to sacrifice performance, simplicity, and features, just to use a certain paradigm, then nobody here is stopping you from doing so.

@JoshuaWise
Copy link
Member

Anyway, just venting.

There's really no merit in venting on the internet about the design choices of libraries given to the public for free, which you're not paying for, and you're not forced to use. If you don't like it, you can go make your own.

Constructive criticism backed by real-world data is always welcome.

@renanoliveira0
Copy link

renanoliveira0 commented Apr 27, 2024

Don't get me wrong, @JoshuaWise , I'm not a native speaker of the language. When I searched for the term, I thought in my native language. The idea I wanted to convey here was actually the opposite. Of course, the library is yours, and you've done and continue to do an excellent service to the community by making it available to everyone. What I meant was precisely in that sense, saying that I didn't want to criticize you or your design decisions here because, after all, as you rightly said, it's your project, and no one is obligated to use it.

I've been using another one for over a year because for me, being non-blocking means real performance. I don't make so many queries that any performance gain with this or that library means anything; hardly an application running on the client side makes enough queries to justify it, but if there are heavy queries, being blocking will always cost a lot.

I think you've done and continue to do good work here, including following this thread and being willing to respond even when you wouldn't have to. I and everyone here are grateful for that.

I just came here and wanted to express this design decision of mine for my projects, so I can better inform others, not to suggest that you change anything or criticize anything. If for your project and so many other projects, being blocking doesn't cause problems, then great, but it's not the case for everyone.

I think using a solution like the one you recommended to make it non-blocking goes against the whole purpose of using a library, which is to save time and keep complexity low. If it's about making workarounds, I don't think it's worth using a lib. Simply an await should be enough, as it's not the case, I think it's another lib for me, and for those with heavy queries.

Again, I just wanted to leave this recorded here for indexing purposes and maybe help someone in the future with these points.

I'm sorry if you felt attacked or insulted by anything; I can tell you absolutely that it wasn't my intention, and thank you for the excellent work in the library!

@melroy89
Copy link

melroy89 commented May 3, 2024

Maybe a good idea to re-evaluate the benchmark from 2020 with todays result: https://github.com/WiseLibs/better-sqlite3/blob/master/docs/benchmark.md#results. I'm just curious.

I understand that doing a benchmark is not easy. And doing a good benchmark is harder. And executing a fair benchmark is even more difficult. But yeah...

@JoshuaWise
Copy link
Member

@melroy89 Here's one person who recently ran the benchmarks and got even better results than the ones advertised here: #1120

@houd1ni
Copy link

houd1ni commented May 5, 2024

@melroy89 Here's one person who recently ran the benchmarks and got even better results than the ones advertised here: #1120

Maybe update the official bench results then ?

@DanKonigsbach
Copy link

For my current situation, I am about to switch to better-sqlite3. Here's why:

I am updating an old personal Electron application that uses SQLite3. I was strongly tempted to convert from an ipcRenderer.send request/ipcMain.send response model to ipcRenderer.invoke/ipcMain.handle - it fits the logic better. BUT, that is problematic with node-sqlite3. Its database operations (get, all, etc.) are asynchronous and do not return promises. I found no obvious way to have the handler await the completion of the node-sqlite3 operation rather than return prematurely. (The send/send model doesn't have this issue, since the ipcMain.send responses are in the completion callback.)

So, better-sqlite3's synchronous model fits the Electron ipcRenderer.invoke/ipcMain.handle model well, while node-sqlite3's particular asynchronous model would be a force fit.

Your mileage may vary.

@wilyJ80
Copy link

wilyJ80 commented Aug 1, 2024

Reviving this discussion once again, taking advantage of the performance topic, to ask if running the db in memory provides extra performance by avoiding some disk IO bottleneck? Theoretically, yes, but has anyone ever tried it? I suppose this is more of a SQLite question than a better-sqlite3 question, but I'm loving the tests and benchmarks here and would love to hear some insight.

@t1u1
Copy link

t1u1 commented Dec 25, 2024

ronburk: it seems like not being able to overlap a read-only connection with another connection reserved for updates might leave significant performance gains on the table, depending on the mix of requests (recalling the UW research that suggests server throughput under load can be dramatically improved by giving priority to shorter-running requests). Ultimately, the disk itself is serialized, but we typically try to keep it loaded up with requests so it's never waiting on us.

Even if the disk i/o is serialized (which need not be true for SSD drives), there are multiple opportunities for parallelism higher up in the stack. Caches, indexes, computations can all be parallelized.

JoshuaWise: I've tried running benchmarks where async connections (node-sqlite3) were trying to perform read requests concurrently with an overlapping writer connection, but I couldn't get it to perform as well as this library.

But that doesn't address the possibility of this library improving in performance even further due to async. The overhead of switching contexts in the event loop is extremely negligible compared to the rest of the operations (caching, indexing, disk i/o). So, I agree with @ronburk, that there might be a lot being left on the table with a sync API.

JoshuaWise: The proper implementation in better-sqlite3 looks like this:

const db  = require('better-sqlite3')('data.db');

const myTransaction = db.transaction(() => {
  db.prepare('UPDATE ...').run();
  return db.prepare('SELECT ...').get();
});

You could achieve the same ergonomics from an async API using Async Context. Some of those features are relatively new, so compatibility for very old versions of node might be a problem.


Supporting an async API maybe extra work, and of course it is the choice of the maintainers whether they want to invest in it. But I think there are benefits to it.

@paleo
Copy link

paleo commented Dec 28, 2024

Ultimately, the disk itself is serialized

Another use case for parallelism: my server currently handles several dozen SQLite files. Some of them contain files such as images and PDFs. I'm reluctant to go the synchronous way for all I/O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests