Hacker Newsnew | past | comments | ask | show | jobs | submit | levkk's commentslogin

CRUD apps don't usually delete in bulk. It's also hard to structure partitions in a way that doesn't wipe out months of important business data -- this is why teams often ETL their DB into Snowflake/ClickHouse and only then drop partitions. That makes it hard for the app to use that data again.

The better approach is either to change your storage engine (e.g. OrioleDB is working on adding the undo log to Pg), or to shard which distributes the vacuum load across multiple servers.


They should be performing bulk deletions, due to GDPR: “Data must be stored for the shortest time possible.” Unless you have some kind of rolling cron checking every few minutes (and even then, depending on your scale, that may well be considered bulk), that generally resolves to something like daily or weekly deletions.

You can have multiple. All sharding is config-based, so no real-time synchronization is required.

This just checks if the package is installed, not if the installed version is infected. Presumably, if you (me...) haven't run `yay -Syu` in a while (months), we're fine, right? ...Right?

Goddamit, don't make me reinstall Arch, took me a week last time.

Update: archinstall rocks, back in business after like 15min.


I didn't realize there are _any_ managed providers of PgDog out there...do tell!

Awesome, glad it worked!

We also do that! But it's not well documented at the moment.

Thanks! Glad we made it relatively easy to migrate!

Yeah good callout. We'll add rendezvous soon enough. Until then, being compatible with Postgres partitions has been advantageous -- while we build everything out, people were able to migrate to PgDog for the query routing layer while doing the resharding in Postgres.

Adding a sharding function in our architecture is relatively straightforward. We also support plugins which can control the flow (and direction) for queries, so our users can add their own (and they do!).


TBH I don't think it's that straightforward, I see it more of a notable architectural change. At a very high level, this means:

* Adding a sharding function, as you say.

* Developing an external service for metadata (shard placement) or alternatively have that metadata in one place and replicate (consistently!) to every query router.

* Implementing functions/catalogs for the users to understand the placement and configure/alter it.

* Implementing shard migration / rebalancing capabilities, possibly using Postgres logical replication (plus notable automation).

Here's one idea if you follow this path, something that Citus doesn't have: make the sharding function pluggable and pick one by default which is well-known and available in many languages (e.g. xxhash). If you do so, and guarantee stability of those functions, they could be used externally (applications) to route queries / inserts especially to the appropriate shard. While it makes application more complex, it may allow (combined with access to the metadata service) for faster ingestion paths (this is often known as application assisted sharding), and its not exclusive of the query routers.

Edit: formatting


Not yet, but actively working on this as we speak.

fwiw, we support cross-shard transactions. They are not magic though, just good old 2pc and a bit of coordination.

2pc is only safe if every part of the system has guaranteed uptime, which it never does. Assume that cross-shard transactions only work in the happy case and may result in inconsistent data otherwise.

They also reduce the benefit of sharding, possibly down to worse performance than a non-sharded DB.


For sure. They should be used for "metadata"-style tables only. High throuput writes should be direct-to-shard.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: