Hey @Max
Thank you for the answer!
So, both options are server-side only.
The first one won't work for us, because there can be only one watermark (we need dynamic captions, several per video, added on the fly depending on live stream content).
The second option can theoretically work, but...