This shows up when you're not running a Zephyr mirroring bot and lets
you use Webathena to have us run it. Obviously needs more docs.
Current problems include:
* supervisorctl reload ends up recreating /var/run/supervisor.sock
with the wrong permissions, so it only works once in a row before
you need to chmod that.
* /etc/supervisor/conf.d needs to be humbug-writeable; this is a clear
local root vulnerability
* This uses SSH and thus is kinda slow.
(imported from commit 7029979615ffd50b10f126ce2cf9a85a5eefd7a2)
The davidben-patched-for-roost Zephyr branch (available at
https://github.com/davidben/zephyr/tree/roost) adds Zephyr support for
these options. We also patch python-zephyr to expose them. These
basically let you save your Zephyr tickets and port number to a file,
so that you can later restore them (even potentially after the machine
rebooted). Basically because Zephyr is UDP, the Zephyr server will
continue trying to deliver messages to a particular port number that
was registered for up to 20 minutes after getting an error; so we can
even have downtime and reboot and still get our packets so long as we
restore the sessions within 20 minutes.
(imported from commit 986cbb157ddfa57aa4b644cd826f8418e9876dc7)
Previously it only provided the list of all public streams; now it
allows one to specify any union of some of the following:
* all public streams
* all streams the user subscribed to
(the most relevant being the union of those two, which is what we want
for the "streams" page).
Or:
* all streams in realm (superuser only)
The manual task required is that when this is pushed to prod, we need
to also deploy the new sync-public-streams version to zmirror.
(imported from commit 27848b8bd136e2777f399b7d05b2fdcec35e4e21)
Our .crypt-table parsing code isn't quite correct, in that we don't
handle either the "zcrypt default" or "zcrypt by class/instance" pair
options (for sending messages in either direction) -- you have to be
zcrypting for an entire class. I think this makes sense given that on
the Zulip end we can only enforce anything on a stream level.
(imported from commit a7901b1dc025a04a23ee71ecdd499e3f150ba614)
When we deploy this, we should remove the relevant jobs from root's
crontab on our app servers.
(imported from commit 749be952d504f5a4d243cf59f6430acc689fc821)
For now we only support the AES encryption type since the DES one is
probably not used anymore.
(imported from commit 222606db9f704917e74159e7d07a110187a236e6)
Just before this is pushed to prod, we need to rename the Humbug feedback
bot in the database using:
./manage.py change_user_email feedback@humbughq.comfeedback@zulip.com
/etc/init.d/memcached restart
and we also need to update and restart feedback-bot in its deployed
location.
No action is required on pushing this to staging, but in between when
this is pushed to staging and when it is pushed to prod (and that
transition performed), feedback will not work on staging.
(imported from commit 73fc36f680b978f3aebae5df1822918ae4d4e952)
When we push this to staging, we'll need to rename the bot in the
database and also pull on git.zulip.net.
(imported from commit 22b2397b197c8820f0e55daecd8f98d829e195bd)
This must be deployed after we update our running nginx configuration
to serve api.humbughq.com.
(imported from commit b5c34ebdd595f55eecd6dca6a18a37f105107bd5)
Since in the future we might want requests to add subscriptions to
include things like colors, in_home_view, etc., we're changing the
data format for the add_subscriptions API call to pass each stream as
a dictionary, giving a convenient place to put any added options.
The manual step required here is updating the API version in AFS
available for use with the zephyr_mirror.py system.
(imported from commit 364960cca582a0658f0d334668822045c001b92c)
This way we can return properties of the streams other than just their
names in future versions of the API without breaking old clients.
The manual step required is to deploy the updated version of
sync-public-streams on zmirror.humbughq.com when we deploy this code
to prod.
(imported from commit 42b86d8daa5729f52c9961dd912c5776a25ab0b4)
For consistency, and because nobody could think of a reason to have it live
in bots/ with a symlink.
(imported from commit def372653fcdde2805729134fec9d4bc3ce294ec)
Modified files need to be copied into the right place. The checkout
on git.humbughq.com also needs to be updated.
(imported from commit dbe9e05a0512e1f59c7819dd8d44c2c4e9c83bcf)
feedback-bot needs to be updated and restarted after this is pushed to
prod for these changes to take effect.
(imported from commit fcabd2f4bb26c794454e096242a8073805fc786c)
We leave the stuff under api/ alone for now, since we need to be able to ship
it as a standalone thing.
tools/post-receive wasn't using the function anyway.
For push to master: Push this commit, update post-receive per instructions at
the top of that file, then push the rest of the branch to confirm that the hook
still works.
No manual instructions for prod.
(imported from commit 9bcbe14c08d15eda47d82f0b702bad33e217a074)
At Ksplice we used /usr/bin/python because we shipped dependencies as Debian /
Red Hat packages, which would be installed against the system Python. We were
also very careful to use only Python 2.3 features so that even old system
Python would still work.
None of that is true at Humbug. We expect users to install dependencies
themselves, so it's more likely that the Python in $PATH is correct. On OS X
in particular, it's common to have five broken Python installs and there's no
expectation that /usr/bin/python is the right one.
The files which aren't marked executable are not interesting to run as scripts,
so we just remove the line there. (In general it's common to have libraries
that can also be executed, to run test cases or whatever, but that's not the
case here.)
(imported from commit 437d4aee2c6e66601ad3334eefd50749cce2eca6)
Realistically, if the bot crashes once, it'll probably crash the next
time too, so I'm not convinced we need this loop at all, but in the
interests of avoiding churn on an extensively tested script, I'm going
to err on the side of the minimal change here.
(imported from commit e2bbd3700395ba4d0b181a4616e816e8f1231669)
This change adds a copyright notice and moves our site-specific bits
to global constants at the top of the file.
(imported from commit ccc8cf10f2d0d70c7500b12c7849406268313bae)
This adds two characters to the length of our default format field, but
based on a conversation I had with kcr, I think this should probably be
okay. If it's a problem, the symptom we'll see is that certain people
will be unable to send zephyrs with this default format (so, certain
Humbug users will have their forwarding consistently fail).
We need to remember to, in a future commit (once everyone has started
using the updated version), remove the:
> or notice.format.endswith("@(@color(blue))")
(imported from commit 703ef60f524646bca8d5099c9066efabd365be43)
Fixes#602.
I replaced the SIGKILL with a SIGINT, and then catch SIGINT with a
handler. This handler calls cancelSubs if necessary, and can later be
edited to perform other clean-up operations, too. I thought about, in
this same commit, changing the SIGTERM in
maybe_restart_mirroring_script to a SIGINT, but after tracing out the
code paths, I realized that isn't necessary. (The SIGTERM is
necessarily performed on a process that has not subscribed to any
zephyr classes, so cancelSubs is unnecessary. If we do think that we
may want to add additional clean-up operations in the future, though,
then it might be worth investigating changing this SIGTERM.)
(imported from commit 692b295be6cb40b0e4ec2ca0bc58c58056ed9bd9)
Bots are not part of what we distribute, so put them in the repo root.
We also updated some of the bots to use relative path names.
(imported from commit 0471d863450712fd0cdb651f39f32e9041df52ba)
Previously, if users of our code put the API folder in their pyshared
they would have to import it as "humbug.humbug". By moving Humbug's API
into a directory named "humbug" and moving the API into __init__, you
can just "import humbug".
(imported from commit 1d2654ae57f8ecbbfe76559de267ec4889708ee8)
This was causing about 10% of the time, personals being forwarded by
tabbott/extra@ATHENA.MIT.EDU to get this error:
zwrite: Field too long for buffer while sending notice to
tabbott/extra@ATHENA.MIT.EDU
We don't fully understand the cause of the problem, but this fixes it.
(imported from commit 22c39a1061cde9ad6973ef07aee7227623a3d47d)
Sometimes the very first message we send isn't received by
python-zephyr; this could be because of our growing a 17-message queue
of zephyrs to service, so let's not do that.
(imported from commit 281bf1807442b6335b05c803b1a47e0a162bef4e)
This ensures that Humbug receives the verbatim text of the Nagios
body, rather than a slightly modified version. Tested by running
manually.
(imported from commit 7e0eea0b230fd8c5552860acfb7372099598f036)
The refactoring that we did a couple weeks ago to make the zephyr
mirror script restart itself automatically (by splitting it into
zephyr_mirror.py and zephyr_mirror_backend.py) had a poor interaction
with our code for killing old zephyr_mirror processes (to prevent
double-mirroring). If you manually ran two copies of the outer
mirroring script (zephyr_mirror.py), then the second one would on
startup kill the first one's zephyr_mirror_backend.py children (for
being duplicate zephyr_mirror_backend.py processes that would result
in double-mirroring). However, importantly, it did not kill the first
mirroring script's zephyr_mirror.py parent process, so the first
mirroring script would then proceed to startup up new children. The
process then repeats, with the two scripts' roles reversed.
This issue didn't affect the sharded mirroring case, where I had been
doing the testing of the kill code with that refactoring, because we
don't have a version of the outer zephyr_mirror.py loop for that
situation (a consequence of it being hard to restart the threads
properly with the run_parallel API that we're using to spawn all the
children).
(imported from commit d4886ac77312a6b0ebd0d612f6fb084970bf23a2)
This is probably a lot more annoying to use, in that e.g. there are
separate log files for the two directions of the mirror, but we
haven't used these logs for much, so whatever.
(imported from commit d3bc407d90099214d242526c01cd3d3cd9d9d9bd)
Previously all that we logged was the sender, which turns out to be a
constant for humbug=>zephyr mirroring.
(imported from commit 527a3ac4b9b815a2b4d6b63c3ad92d9d5ad99a6e)
feedback-bot and zephyr_mirror will need to be updated and restarted
when this is deployed to prod.
(imported from commit fe2b524424c174bcb1b717a851a5d3815fda3f69)
Features include:
* Not forking into two processes (shells out to zwrite to send
instead). This makes life easier since we're not doing concurrent
programming.
* Eliminated a lot of hard-to-read or unnecessary debugging output.
* Adding explanatory test suggesting the likely problem for some
common sets of received messages.
* Much less code duplication.
* Support for testing a sharded zephyr_mirror script (--sharded).
* Use of the logging module to print timestamps -- makes debugging
some issues a lot.
* Only one sleep, and for only 10 seconds, between sending the
outgoing messages and checking that they were received.
* Support for running two copies of this script at the same time, so that
running it manually doesn't screw up Nagios.
* Passed running 100 tests run in a row.
(imported from commit a3ec02ac1d1a04972e469ca30fec1790c4fb53bc)
We should be canonicalizing stream names to class names in
update_subscriptions_from_humbug, before we even decide which classes
to subscribe to; otherwise deduplication and tracking of which classes
we're already subscribed to won't work.
(imported from commit a751b6fca1022390a087516a0730ff77f13d7edf)
Previously, we were sending "Skipping message we got from Humbug!"
for messages we wouldn't have forwarded anyway.
(imported from commit 36df85a61336ac00e3d7913d5a417d6b42764350)
In this case, if we're configured to not forward personals, there's no
point in logging a decision not to forward one.
(imported from commit 62c37591c6a70afb6235de626b0c6a3502cbcb27)
It seems that check-mirroring was reporting a lot of spurious failures
due to it sometimes taking more than 3 seconds for the setup phase of
check-mirroring's receive path to run. So fix this:
(1) Wait a bit more than 3 seconds for the receiver to subscribe to
messages
(2) Subscribe to Humbug messages before forking (we can't do this with
zephyr.init() because python-zephyr gets totally messed up if you use
it from multiple processes with a shared initialization)
(3) Get rid of the old time.sleep(0.x) values that were intended to
make messages arrive in order -- since we're now checking that
messages correctly arrived using set(), they aren't needed.
(4) Use a single request to subscribe to both zephyr classes we need
to subscribe to (saves 1 RTT).
(imported from commit d96aef05405ce43e9a4a549de189da9a2e393875)
Sometimes messages are delayed passed the allowed 10 seconds. We may
eventually decide that that amount of latency is unacceptable, but the
latency is not what this script is supposed to be checking.
(imported from commit d83a6a83d60e9eac13b3b87fb31de7f9881acabf)
This commit changes APIs and requires and update of all zephyr
mirroring bots to deploy properly.
(imported from commit 2672d2d07269379f7a865644aaeb6796d54183e1)
This improves the throughput of mirroring a large number of zephyrs in
a row from about 1.5/second to about 9/second, which is basically
satisfactory.
(imported from commit 5f72680d6290eaa02ef8ced5b3792fb3efc1db41)
Personals are now just private messages between two people (which
sometimes manifests as a private message with one recipient). The
new message type on the send path is 'private'. Note that the receive
path still has 'personal' and 'huddle' message types.
(imported from commit 97a438ef5c0b3db4eb3e6db674ea38a081265dd3)
Previously messages sent to zephyr instance " " wouldn't be forwarded
zephyr=>humbug because the target instance is the null instance; I
learned this when we got a few tracebacks in the zephyr_mirror log, so
clearly this does happen.
(imported from commit 08bd7470e75ac6af24ac83696b6cf68d70654664)
This is useful for trying out new versions of the forwarding code
while the mirroring bot is still running in production.
(imported from commit bcdaf91fed55ac0974b1efe31dd13ed006b6fd06)
This happens sometimes (especially when our server is restarting), and
isn't a _real_ problem -- it's much more important that we have a
completely reliable test that we can put a Nagios alert against.
(imported from commit 0add0b3dfc5447307014bbb9137366bd7141ade0)
Previously we were spending 15 seconds on linerva (and more like 2.5
minutes on the not-yet-operational zmirror.humbughq.com) to subscribe
to all of our streams.
(imported from commit c36cb1c26868f142683d9c92d4875fcd4931886e)
We've had multiple requests from MIT zephyr users to allow
non-alphanumeric stream names, and we haven't decided what we want to
allow, so for now allow everything.
Note that the web client and mirror script limit stream names to 30
characters, which is our database limit.
(imported from commit 2acb5ee04e5ee7c40031ac831e12d09d04bbb2e6)
Zephyr has this great property where a small fraction of the time, the
Zephyr server rejects your attempt to subscribe to something for no
particular reason, returning a "SERVNAK" error.
(imported from commit 6d5ed033d46d77a5b02539a816453724740f8fb0)
This does introduce a small security issue, in that a shell with
expired AFS tokens on the machine running the mirroring script will be
able to read the Humbug API key using /proc/pid/environ. I think this
is fine -- you can steal the API key from a running process using
ptrace anyway.
(imported from commit c6fdb798294fb32d640823b409f3e46274ca01f4)
Because in the zephyr world, people often manually linewrap their
zephyrs, we need to relax the algorithm's assumption that every line
was wrapped to the same linewrapping threshhold. Otherwise messages
that look like this:
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa aaaaa aaaaaa
aaaaa aaaaa aaaaa aaaaaaaaaa aaaaa aaaaa aaaaa aaaaa
might be treated as a paragraph break, even though that was unlikely
to have been the sender's intention.
(imported from commit e1144da2a3efc3241a8f15b0f19fea2ea6684254)
It's generally best to first try killing with SIGTERM, and if we fail
to kill the child, we'll end up trying to kill them with SIGKILL
anyway when the new zephyr_mirror process starts up,
(imported from commit cfee2dd5f809f6e38d90a09be82719a8660d8377)
This eliminates the problems with python-zephyr not being able to
handle non-ascii characters in instances and class names.
(imported from commit c9f295cb18bc5043cd8efecbe6996ff373f66c9a)