feedback-bot and zephyr_mirror will need to be updated and restarted
when this is deployed to prod.
(imported from commit fe2b524424c174bcb1b717a851a5d3815fda3f69)
Features include:
* Not forking into two processes (shells out to zwrite to send
instead). This makes life easier since we're not doing concurrent
programming.
* Eliminated a lot of hard-to-read or unnecessary debugging output.
* Adding explanatory test suggesting the likely problem for some
common sets of received messages.
* Much less code duplication.
* Support for testing a sharded zephyr_mirror script (--sharded).
* Use of the logging module to print timestamps -- makes debugging
some issues a lot.
* Only one sleep, and for only 10 seconds, between sending the
outgoing messages and checking that they were received.
* Support for running two copies of this script at the same time, so that
running it manually doesn't screw up Nagios.
* Passed running 100 tests run in a row.
(imported from commit a3ec02ac1d1a04972e469ca30fec1790c4fb53bc)
We should be canonicalizing stream names to class names in
update_subscriptions_from_humbug, before we even decide which classes
to subscribe to; otherwise deduplication and tracking of which classes
we're already subscribed to won't work.
(imported from commit a751b6fca1022390a087516a0730ff77f13d7edf)
Previously, we were sending "Skipping message we got from Humbug!"
for messages we wouldn't have forwarded anyway.
(imported from commit 36df85a61336ac00e3d7913d5a417d6b42764350)
In this case, if we're configured to not forward personals, there's no
point in logging a decision not to forward one.
(imported from commit 62c37591c6a70afb6235de626b0c6a3502cbcb27)
It seems that check-mirroring was reporting a lot of spurious failures
due to it sometimes taking more than 3 seconds for the setup phase of
check-mirroring's receive path to run. So fix this:
(1) Wait a bit more than 3 seconds for the receiver to subscribe to
messages
(2) Subscribe to Humbug messages before forking (we can't do this with
zephyr.init() because python-zephyr gets totally messed up if you use
it from multiple processes with a shared initialization)
(3) Get rid of the old time.sleep(0.x) values that were intended to
make messages arrive in order -- since we're now checking that
messages correctly arrived using set(), they aren't needed.
(4) Use a single request to subscribe to both zephyr classes we need
to subscribe to (saves 1 RTT).
(imported from commit d96aef05405ce43e9a4a549de189da9a2e393875)
Sometimes messages are delayed passed the allowed 10 seconds. We may
eventually decide that that amount of latency is unacceptable, but the
latency is not what this script is supposed to be checking.
(imported from commit d83a6a83d60e9eac13b3b87fb31de7f9881acabf)
This commit changes APIs and requires and update of all zephyr
mirroring bots to deploy properly.
(imported from commit 2672d2d07269379f7a865644aaeb6796d54183e1)
This improves the throughput of mirroring a large number of zephyrs in
a row from about 1.5/second to about 9/second, which is basically
satisfactory.
(imported from commit 5f72680d6290eaa02ef8ced5b3792fb3efc1db41)
Personals are now just private messages between two people (which
sometimes manifests as a private message with one recipient). The
new message type on the send path is 'private'. Note that the receive
path still has 'personal' and 'huddle' message types.
(imported from commit 97a438ef5c0b3db4eb3e6db674ea38a081265dd3)
Previously messages sent to zephyr instance " " wouldn't be forwarded
zephyr=>humbug because the target instance is the null instance; I
learned this when we got a few tracebacks in the zephyr_mirror log, so
clearly this does happen.
(imported from commit 08bd7470e75ac6af24ac83696b6cf68d70654664)
This is useful for trying out new versions of the forwarding code
while the mirroring bot is still running in production.
(imported from commit bcdaf91fed55ac0974b1efe31dd13ed006b6fd06)
This happens sometimes (especially when our server is restarting), and
isn't a _real_ problem -- it's much more important that we have a
completely reliable test that we can put a Nagios alert against.
(imported from commit 0add0b3dfc5447307014bbb9137366bd7141ade0)
Previously we were spending 15 seconds on linerva (and more like 2.5
minutes on the not-yet-operational zmirror.humbughq.com) to subscribe
to all of our streams.
(imported from commit c36cb1c26868f142683d9c92d4875fcd4931886e)
We've had multiple requests from MIT zephyr users to allow
non-alphanumeric stream names, and we haven't decided what we want to
allow, so for now allow everything.
Note that the web client and mirror script limit stream names to 30
characters, which is our database limit.
(imported from commit 2acb5ee04e5ee7c40031ac831e12d09d04bbb2e6)
Zephyr has this great property where a small fraction of the time, the
Zephyr server rejects your attempt to subscribe to something for no
particular reason, returning a "SERVNAK" error.
(imported from commit 6d5ed033d46d77a5b02539a816453724740f8fb0)
This does introduce a small security issue, in that a shell with
expired AFS tokens on the machine running the mirroring script will be
able to read the Humbug API key using /proc/pid/environ. I think this
is fine -- you can steal the API key from a running process using
ptrace anyway.
(imported from commit c6fdb798294fb32d640823b409f3e46274ca01f4)
Because in the zephyr world, people often manually linewrap their
zephyrs, we need to relax the algorithm's assumption that every line
was wrapped to the same linewrapping threshhold. Otherwise messages
that look like this:
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa aaaaa aaaaaa
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa
might be treated as a paragraph break, even though that was unlikely
to have been the sender's intention.
(imported from commit e1144da2a3efc3241a8f15b0f19fea2ea6684254)
It's generally best to first try killing with SIGTERM, and if we fail
to kill the child, we'll end up trying to kill them with SIGKILL
anyway when the new zephyr_mirror process starts up,
(imported from commit cfee2dd5f809f6e38d90a09be82719a8660d8377)