Hello,
Thank you for evaluating EMS! Here are my thoughts on this problem.
EMS should (and we believe is) on the same level of robustness with ENC. In the situation you just described, what would happen if ENC0 fails? Different encoders, different timestamps, 100% different outputted HLSs, no matter what. Unless, somehow, you synchronise ENC0-n and guarantee they have identical timestamps down to the last frame.
I agree, failover is a must. I suggest the following schema:
EMS is completely hidden from the exterior world, while EMS1-n and WEB1-n are in the first line of attack, serving live streams via RTMP/RTSP and HLS/HDS.
Now, suppose the ENC will shoot out a multicast ts stream which is supposed to be intercepted by multiple EMSs and those have to generate independent HLS (like you said). Even if we base our segments naming on timestamps from the stream itself, there is no way tell or guarantee that both EMSs will get *identical* input. Hence, chunking may occur at different moments, resulting into 2 completely different chunk names sets and content (UDP packets are sometimes lost, and there is nothing we can do about that).
If ENC is guaranteed to deliver *identical* set of A/V frames with identical timestamps to a number of EMSs, than yes, we will make the necessary changes to base the naming of chunks on timestamps, producing identical HLS streams in the end. There i still a problem: even if we do this, nobody guarantees that a chunk is produced in the same instant on both EMSs (serialisation and flushing on disk differences across 2 different machines) . So, if someone gets the m3u8 from WEB1 and than tries to get a chunk from WEB2, that chunk is not guaranteed to exist in there (a millisecond delay is enough to make that happen).
The way I always recommend people to use HLS with gapless failover, is to have one and only one EMS as shielded as possible from exterior and as close as possible to the encoder and let it produce HLS. Than, via network shares/replication/etc, transport the resulted files to 1 or more real first-line-of-attack web servers and let them do the heavy lifting. A simple proxy server installed on each WEB1-n is enough, because that does the job for you beautifully (does caching, content expire, all the good stuff). Moreover, EMS has extensive REST API to let you know when a chunk/playlist is produced and where, enabling you to know about it the moment is safely and permanently on disk. In the explained situation, the replicator is guaranteed to get no more than n HTTP requests per file. The proxy makes sure of that. So, the replicator, is going to do pretty much nothing. You can even collapse EMS and replicator under the same physical machine.
Looking forward to see your comments on the proposed architecture, while we will create an experimental build to derive the chunk names from the timestamps found in the stream itself.
Best regards,
Andrei