Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Naive idiot alert.. could you process the "same" DNA numerous times then take the mode in each case? Or are the errors essentially ones that would be repeated each time?


Actually PacBio does just that to get better a better error rate. They basically have X amount of sequencing that can be done. You can spend that X however you'd like. If you want a sequence that is X long, you'll have a higher error rate. If you want a chunk that is X/10 long, you can circularize it, and thus sequence it ten times. This gets you better accuracy with the redundancy. DNA is pretty robust.

In practice though, even with these "circular consensus sequencing" reads, the error model is significantly higher than other technologies.


One doesn't need CCS anymore, though it is still an option for shorter insert lengths. The errors are more random than other platforms, so they resolve easily with consensus. You just need 20x coverage or more, depending upon the application. With enough coverage you generally get better accuracy than any other platform because it's the least biased technology.

CCS still works really well if you want incredibly high accuracy. See this paper and figure 3 for Q90 quality reads: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3811116/

But most people don't need that for their applications. So just regular consensus using PacBio data alone is sufficient for excellent (Q60 or better) consensus accuracy: https://github.com/PacificBiosciences/GenomicConsensus/blob/...


I don't think they push CCS much anymore. The focus seems to be on generating long reads and then using Quiver to call the consensus (https://github.com/PacificBiosciences/GenomicConsensus/blob/...).


That's probably wise of them. The last time I looked at data from them, the CCS reads were still very noisy. It's better to rely on your strengths. In their case, it's long reads.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: