Hacker Newsnew | past | comments | ask | show | jobs | submit | stanmancan's commentslogin

You can scaffold out a simple app pretty easily. Anything large or complex things break down. If you don’t know what you’re doing you end up leaking secrets like the dozens of examples we’ve seen so far.

You know what the problem is in software engineering? A LOT of people have no clue what good software engineering is.

I was working in a company before which used md5 in 2015! Databases on the internet with a 5 character password. No tests.

A person i know would have broken the whole production DB if i wouldn't have stoped the PR.

Another ex-collegue thought its okay to 'encrypt' with a basic shift cyper creditcard data.

I don't think any of these companies care that much


> You know what the problem is in software engineering? A LOT of people have no clue what good software engineering is.

Indeed. Is Mythos going to change this?


On one side, it means that a certain amount of business will just use it even if you think its not safe/good enough and they will throw out people and will still succeed.

And on the other side: yes because they will also use LLM review or other tooling and will be fine whatever the 'security llm agent' tells them.


Yes. It is going to make better decisions for people that don't know better.

Yes the same applies to junior and inexperienced developers.

A large portion of the content on the internet is now generated by AI.

You can and do have full conversations with bots and not know. I want to interact with humans not LLMs.

There’s no way to combat it. An army of bots can post a specific rhetoric and it can and does sway people’s opinions.

The new version of Digg was shut down because they couldn’t find a way to combat AI. They were at least trying to, other platforms are just eating it up because “user activity” is a win for them.


The sloppification of the internet began before AI. Google was SEOing the open internet to death, Reddit had fully baked in a hivemind, and social media became dominated by professional influencers.

AI is accelerating but also perhaps backfilling in what was already being lost.


AI is the same slop but cheaper. Ideally the value of slop approaches zero but the value of quality stays the same.

But what is the social damage? Can you quantify the damage, even roughly?

It's likely nearly impossible to evaluate that in the short term; I think we're looking at generational damage, much of which won't be apparent for years to come.

I’m pretty sure they’re smart enough to remember to put “make no mistakes” in their prompt.

This lands for me. I’m pushing 40 and over the last few years I’ve definitely been eliminating distractions. Anything with scrolling or algorithms meant to suck you in is gone. Deleting apps and blocking websites on my phone to prevent distractions. Phones getting much less use. Just yesterday replaced my Apple Watch with a regular watch.

I left a more detailed comment on the parent, but it's definitely not impossible!


The scenario in this post is that the first uuid was created one year before the duplicate uuid. That isn’t possible with v7


You're heavily leaning on "collision like this" to relate to the exact time stamps for your statement to be true.

It's equality possible to interpret the "like this" to the collision itself, without a focus on the 1 year distance between the creation dates.

So I guess both views are valid.


The inclusion of a timestamp in v7 makes collisions impossible unless the generating systems think that the time is the same down to the millisecond, which makes the temporal distance quite relevant.


Plenty of systems end up generating multiple UUID's in a single millisecond.

The issue with UUIDv7 is that you also have significantly less entropy since you only have a 62 bits (sometimes less, depending on implementation) of "random" data. So while the time aspect of format lowers the chances of collisions, generating two UUIDv7's in the same millisecond (depending on implementation) have a significantly higher chance of collision than two UUIDv4's.

It's still incredibly unlikely, but it's also incredibly unlikely you generate two matching UUIDv4's, but it does happen.

TLDR; It's possible to generate matching UUIDv7's, don't assume otherwise.


I answered this in another HN topic just the other day: https://news.ycombinator.com/item?id=48061098

But essentially, using UUID v7 you actually have less risk of collisions than with UUID v4.

Because of the birthday paradox, if you have N bits of randomness, you can expect a collision approximately after (2^((N/2)-1)) random numbers.

With v4, you have 122 bits of entropy over all time, so will see a collision after 2^60 allocations, approx 1.2 x 10^18.

With v7, you sacrifice 48 bits of entropy to give you 74 bits of entropy every millisecond, so you will see a collision after approximate 2^36 allocations per millisecond, approx 6.8 x 10^10 per millisecond.

You could argue that the risk of a collision is too high per millisecond because it's likely that 68 billion UUIDs are generated every millisecond. And maybe I'd agree. But the counter argument is that with v4 you'd expect a collision after 2^24 milliseconds, or 280 minutes, allocating at the same rate of 68 billion UUIDs per millisecond.

Obviously "all time" is longer than "280 minutes", so v7 is actually statistically less likely to cause collisions than v4, even though it seems counter-intuitive because it has a smaller space devoted to entropy. The key insight is that the time provides bits that are guaranteed to be unique, so only collisions within the same timestamp are significant, and every bit used to provide known-unique values is worth 2 bits of entropy.


Sorry if I worded poorly but you’re definitely less likely to run into a collision with v7, but it’s not impossible, which is what I was trying to point out.

Thanks for a more articulate answer!


Surely the scenario where he generates the same number of items as he did between 2025 and now, but did it in 1 tick of v7 UUIDs also runs into it?


The scenario being the collision itself, the time period isn’t particularly relevant aside from it occurring much quicker than expected.


It's still possible in most implementations of UUIDv7.

UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!

Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:

- Purely random data means you can still run into collisions and anything within the same millisecond is unordered

- Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.

- Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.

You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.

If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.


I think by the time you're building a system that needs to generate (and persist!) billions of identifiers per millisecond, you're solidly past the point where all your design decisions need to be vetted for whether they make sense on your extremely exotic setup.


But 12 bits is not "billions of identifiers" -- it's 4096. Once you exhaust that counter in the same millisecond, you are still relying on a gamble that your random source will not generate the exact same bit sequence for the previous same counter value. And this thread started out with the OP explaining that random collisions are much more common than we'd like them to be, for various reasons.


We have a dedicated snowflake id generator service that returns batch ids. It's also distributed, each service adds its own instance number to the id. When it overflows it just blocks for the next ms. For our traffic, it's never a bottleneck.


Something I use on my own distributed system (where I wanted 64-bit IDs), is use 32 bits for the time in seconds (with an epoch from 2020, so good until 2088), 8 bits for the device ID and 24 bits for a serial number (resets to 0 every time the seconds increments).

That's generally enough IDs per second for most of my edge nodes, but the central worker nodes need more, so I give them a different split and use 4 bits for the device ID and 28 bits for serial number instead.

If a node overflows its serial number that second, I kind of cheat and increment the seconds field early. Every time this happens, I persist the seconds field to the database, and when the app restarts, it starts its seconds count at the last persisted seconds plus one. If the current time in seconds is greater than the last used seconds, I also update it and reset the serial number. Works remarkably well for smoothing out very occasional spikes in ID generation while still approximately remaining globally sortable.

I also "waste" a bit of the 32-bit time field by considering it to be signed, even though it's not really because I don't expect this system to last long enough to reach times where the MSB gets set. But if I ever change my system, I'll set that bit and everything will stay ordered. I'll probably reset the epoch at that point too.


Unfortunately people are inherently lazy. Curious and driven indivdiuals will excel with the availability of LLM's, but the majority will atrophy.



I'm going to give Apple the benefit of the doubt here until proven otherwise. I can't see them releasing something with a terrible user experience as it would cause a lot of reputational harm.


> I can't see them releasing something with a terrible user experience

I see you haven't upgraded to Tahoe yet!


It's cheap for what you get.

If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.

All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.


> you an also just buy a RPI for a fraction of the price of the GMKtec G10.

Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300


>Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

At that point buy a used macbook air m1.


>you an also just buy a RPI for a fraction of the price

lol. you need to look at rpi 5 prices again. they are insane.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: