Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Whereas Twitter stores home timelines in a dedicated in-memory database, in Rama they’re stored in-memory in the same processes executing the ETL for timeline fanout. So instead of having to do network operations, serialization, and deserialization, the reads and writes to home timelines in our implementation are literally just in-memory operations on a hash map. This is dramatically simpler and more efficient than operating a separate in-memory database. The timelines themselves are stored like this:

> To minimize memory usage and GC pressure, we use a ring buffer and Java primitives to represent each home timeline. The buffer contains pairs of author ID and status ID. The author ID is stored along with the status ID since it is static information that will never change, and materializing it means that information doesn’t need to be looked up at query time. The home timeline stores the most recent 600 statuses, so the buffer size is 1,200 to accommodate each author ID and status ID pair. The size is fixed since storing full timelines would require a prohibitive amount of memory (the number of statuses times the average number of followers).

> Each user utilizes about 10kb of memory to represent their home timeline. For a Twitter-scale deployment of 500M users, that requires about 4.7TB of memory total around the cluster, which is easily achievable.

Isn't this where the most difficult(expensive) part is and Rama has little to do with it? It appears that the other parts also do not have to be Rama.



We're storing those in-memory within the Rama modules materializing the home timelines. And the query topologies that refresh home timelines for lost partitions is colocated with that. This is dramatically simpler than operating a separate in-memory database, and Rama has everything to do with that.


It appears simpler and better without Rama.

> So instead of having to do network operations, serialization, and deserialization, the reads and writes to home timelines in our implementation are literally just in-memory operations on a hash map. This is dramatically simpler and more efficient than operating a separate in-memory database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: