I'll argue for the +0.5 solution. First, I don't like half-sized intervals at the edges, and second, a 255-based representation is typically a SDR (not HDR) image.
RGB values represent luminances against some adapted state, and a "zero" in a daylit scene is not "zero luminance" - it's just about 0.001x as bright as the brightest point - it's millions of photons, way more than zero. In a sense our eyes experience contrast on a sliding scale, and there is no absolute zero in the system. For example, broadcast systems historically used 16-235 as their luminance range for SDR. I think any argument that says "we must have zero" is going to have a bias, but I don't think zero is needed for most things.
As someone with a lot of experience in this area doing image processing and rendering for VFX (including writing image readers and writers for my own software and commercial VFX software), I think you might be forgetting that colourspace conversion (to sRGB 'linear' rec709 for old-school SDR, but other more wider gamuts for newer formats) would happen after this, so the 'squish' of the dynamic range would happen after loading.
Also, a lot of workflows for image processing and compositing do assume that 0 means zero, whether correctly or not (often incorrectly). So there are often assumptions that for 8-bit, 0u maps to 0.0f and 255 maps to 1.0f for things like masking or alpha: as soon as you have 0 values which become just over 0.0, you then have artifacts because some code somewhere is using a hard threshold of 0.0 to mask some other operation, and vice-versa for 1.0 with alpha, where suddenly because the 255 values are no longer 1.0f, you have very slightly see-through objects (often only visible in certain situations or when pixel-peeping) after pre-multiplication.
(Same thing can happen when 254 becomes 1.0f after +0.5 with masking).
I think more to the point, if 0 doesn't represent 0.0, and 255 doesn't represent 1.0, congratulations you've just lost your additive and multiplicative identities and most of the math used in colors falls apart.
The argument for 0-256 feels compelling when thinking about the physical display, but it seems like a very poor fit for any digital image processing or rendering.
> Remember how the 0 and 255 bins poked slightly beyond the [0,1][0,1] range’s edges? In the standard approach, the range of representable values is actually [−0.5/255,255.5/255][−0.5/255,255.5/255], meaning the bins are spaced further apart than strictly needed for [0,1][0,1] inputs
This is of course silly: the "range of representable values" of floating point colour components is [0,1] independent of quantization and how an invalid input would be quantized is irrelevant.
Looking at the actual "big picture" there are 256 representable values and (taking into account gamma correction, arbitrary ranges other than [0,1], deliberately nonuniform quantization bins, and other plausible complications) their correspondence to 256 floating point values should be regarded as a generic lookup table, abandoning all hope of using elegant and cheap formulas and making it obvious than encoding and decoding differently is not an option.
Although the post focuses on RGB, the same quantization issue exists for any type of signal being mapped between discrete and continuous representations.
The issue isn't in having a representation for 0 photons, but about maximizing information stored in a byte. Ideally you shouldn't be underutilizing the byte value 0, nor add bias to data that should have been assigned to the 0th bucket, regardless of what it represents (you could have a color space that goes from bright to super bright, and still want to ensure that every byte represents equal chunk of your brightness range).
Yep, the exact same problem arises in digital audio, mapping between integer sample formats and the floating point representation that is generally used internally.
> In a sense our eyes experience contrast on a sliding scale
There's a whole visual center to check the amount of incoming light and adjust your pupils for you. It's intentionally reactive.
> and there is no absolute zero in the system.
There maybe is. I think we call that "blind."
> broadcast systems historically used 16-235 as their luminance range for SDR
Mostly because it was a fully analog system and these all translate down to signal voltage. Jokingly NTSC used to be referred to as "Never Twice the Same Color" due to being a compromise bolted onto the side of an already compromised system.
They don't - trunc(result * 255 + 0.5) is just round(result * 255), i.e. error is on average 0 whereas trunc(result * 256) has an average bias of -0.5.
> For example, broadcast systems historically used 16-235 as their luminance range for SDR.
Unfortunately "modern" HDMI is still plagued by this insanity so if your display and source don't agree you can either get washed out or crushed blacks.
Interesting idea, but somehow I feel the world is shaking. For the processing program, what used to black(0.0) and white(1.0) has became very dark gray and very bright gray.
For 8-bit, 16 maps to 7.5IRE which is the well understood legal black. Mapping 235 means they mapped peak to 110IRE. This is based on a 0-120IRE scale. This gets weird as the broadcast limit for video was 100IRE allowing for the chroma to reach 110IRE. So if you're trying to limit your white values to 235, that'll be higher than is broadcast safe. Of course, nobody cares about NTSC broadcast limits any more. However, to this day, I still see out of spec tapes marked as "broadcast master" that have been ingested for streaming use. It drives me crazy to this day, and it's only getting worse as people don't even have scopes to adjust the VTR's TBC properly.
The "16" digital black level is independent of the "7.5 IRE" analog setup. E.g. in Japan with an 8-bit "NTSC-J" Rec. 601 system, my understanding is that 16 still maps to E'Y = 0 which is now at 0 IRE, and 235 is still E'Y = 1 at 100 IRE.
Ugh. Sudden flashbacks to having to switch analog output between Japanese NTSC (no pedestal) and US NTSC (with pedestal) without getting weird noise in the black regions.
But IIRC the MPEG-2 standard had luma==235 -> 100IRE for all of the analog formats (pal/ntsc-j/ntsc/secam) so I'm not sure why you say that would violate the broadcast limits?
Simply because the math works that 7.5IRE on a 120IRE scale maps to 16 8-bit that 110IRE maps itself to 235 8-bit on a simple scaling equation. To get 235 8-bit to match to 100IRE means some sort of exponential scaling. At that point, I stuck with the linear scale and moved on with the keep it simple stupid mindset
I agree. Additionally, both 0.0 and 1.0 don't really exist for dithered signals, so a byte should map to [0.5, 255.5] before division by 256. This also solves the signed integer asymmetry, as a signed byte maps to [-127.5, 127.5] before division by 128. I wonder if audio DSP folks have done this already.
It is still frequency, where it would have negative values. but I doubt any color handling algorithms deal with it as a frequency. Rightfully so, the physical wetware for decoding images is very different than that for decoding audio. Well... not that different if you think of audio as a single pixel monochrome image.
Now I am imagining a weird alternate history where we treat audio like we treat color. OK take three bytes which encode how loud the sound is, one for lows, one for mids and one for highs where lows mids and high frequencies are picked to match human ear response.
was never end to end, was you to server and then server to other party. Meaning Zuky boi always had access to your messages in clear (and NSA + all other 3 letters agencies)
Except this math is 10x too high (unless accelerated depreciation is all of it) - a million tokens at 28 tokens/sec and 75W and 20c/kwh should cost $0.15 not $1.50. (And less with MTP.)
2. The "description.md" example has things like "faces -> cluster_id". Is this from Davinci Resolve's face index? Things like faces+names and locations are really important with photo collections, but general LLMs don't handle them so well.
1) It's just simple plain-text `.description.md` sidecar files, one per clip, sitting next to each video.
Something which I can query later - Like when brainstorming with Claude "I wanna make some videos of the Luxury rooms in the lodge" and it knows what all videos could help here (going through the files).
There's also a folder root level files that aggregates the text descriptions to make it easier to find.
2) No - nothing from DaVinci Resolve. Framedex is a standalone pipeline. Resolve isn't involved.
Faces come from insightface (the open-source buffalo_l pack - RetinaFace for detection), running locally on CPU. For each clip it detects faces in the sampled frames, embeds them, and writes rows to ~/.framedex/faces.db.
Tbh, this part I know it's building up in my local DB but I haven't tested how good is it. Will check them out properly soon.
But yeah, on your broader point that's why framedex deliberately does not ask the LLM to handle faces or locations.
----
Faces → insightface / ArcFace embeddings. Deterministic, comparable across clips. The vision model only contributes a rough people_count; it never tries to identify anyone.
Locations → EXIF GPS via exiftool, reverse-geocoded through Nominatim/OpenStreetMap. Hard metadata, not a guess.
The LLM only does what it's good at: scene description, mood, shot type, keywords, keep/review/cull rating (this last part is also debatable though).
Hang on, this is a $2.99 one-time payment - below the level you can even buy ads and make a profit. There is no way he's trying to make billions of dollars this way, and it's honest and smart. Consider the perspective - what happens with the "free alternatives"? You know they're not free: either they already track you, or someone buys them and turns them into spyware. We need more things like this, not less.
The aspect ratios were much "taller" back then, which was kind of better for editing code. All these late 90s designs were near NTSC at the time - aspect ratios like 1.25:1 (1280x1024) or 1024x768 (1.33:1). Monitors have always followed TVs, since displays now are the "HD" ratio of 16:9 (1.77:1), or 16:10 if we're lucky. But we do get way more pixels now anyway.
Use portrait displays! You'll never complain about the height again. Use 1440p (or scaled equivalent) and the width won't be an issue either. Many displays make this easy (all the Dells I've had can rotate), and adequate rotating stands aren't necessarily expensive.
If you back up to "intention" it's fully insane to make a GDPR argument against on-device AI. Yes it downloads bits, but those bits are not there to identify you - they are basically a local copy of the internet. This enables private data to be kept on-device. Having no personal data leave the device is fantastic for GDPR compliance.
The good point in this article is about how the "AI" features in Chrome all use Google's cloud API and not a local model. That's true and some of it should be local. ("AI mode" uses the Web index, so it fundamentally cannot be local, but there are features that could be.)
Chrome seems to use a custom inference runtime also (in addition to Gemini Nano). It would be better if this were all interoperable. The WebGPU alternatives like WebLLM do not have the same access.
I've been trying these models out for the last year, and it seems to me that we want them to work in a 5-10W "laptop" power envelope, but they really work best with a 50-500W GPU instead - i.e. they eat batteries. This means things work better in a "plugged in" gaming laptop/desktop rather than a typical web client. At least for now.
RGB values represent luminances against some adapted state, and a "zero" in a daylit scene is not "zero luminance" - it's just about 0.001x as bright as the brightest point - it's millions of photons, way more than zero. In a sense our eyes experience contrast on a sliding scale, and there is no absolute zero in the system. For example, broadcast systems historically used 16-235 as their luminance range for SDR. I think any argument that says "we must have zero" is going to have a bias, but I don't think zero is needed for most things.
reply