15:07 #startmeeting 15:07 Meeting started Thu Jul 31 15:07:43 2014 UTC. The chair is xnox. Information about MeetBot at http://wiki.ubuntu.com/meetingology. 15:07 15:07 Available commands: action commands idea info link nick 15:07 don't know what the procedure is... 15:07 #topic Lightning round 15:08 go ? 15:08 Last week 15:08 -change direction and abandoned the security update to 7u55-2.4.7 15:08 -merged debian package 7u65-2.5.1 to ubuntu utopic, merged differences into this package 15:08 -backported it to 12.04, 14.04 and utopic, this went well except for 12.04 15:08 This week 15:08 -posted test differences between various openjdk versions for regression analysis 15:08 -struggling to deal with 'quotes' in a call to 'configure' in the build/ directory and wondering why the build system is reverting Matthias's patches 15:08 -continued to debug the above 15:08 -get familiar with debian version 3 packaging 15:08 done 15:08 ... 15:08 $ shuf -e jodh robru sil2100 xnox barry bmurray doko stgraber 15:08 xnox 15:08 barry 15:08 jodh 15:08 robru 15:08 sil2100 15:08 bmurray 15:08 doko 15:09 stgraber 15:09 not as fancy. 15:09 .. 15:09 whoops 15:09 * usb-creator - formating fixes uploaded 15:09 * mega-transition helping out tie up loose ends 15:09 * llvm 3.5 - fixed powerpc build (committed upstream) and fixed 15:09 ppc64el packaging (accepted in debian). Will switch default to 3.5 15:09 after alpha2 freeze block is lifted. That will make temporary add 15:09 3.5 & 3.4 in main, until next mesa update. 15:09 * jodh wonders who 'not as fancy' is :) 15:09 * uploaded ubuntukylin-meta seating in NEW, but proper further seed 15:09 setup is still needed. 15:09 * exit interview, and HRy things. 15:09 * working on porting more lazr.* things to python3 15:09 * sent rebased apt-ftpmaster generate contents only patch to mvo on 15:09 BTS. (To enable for it to be used, will need further patch to cronscripts) 15:09 TODO: 15:09 * complete more systemd/startpar tasks 15:09 * complete lazr.*/launchpadlib ports 15:09 .. 15:09 barry: =) 15:09 jodh: oh just a comment. as in it's not a one liner as it usually pasted around here. 15:10 xnox: yeah, I'm just being silly. 15:10 phone: system-image 2.3.1. smoke test crash investigation (LP: #1349478). discussing jani's branches. other stuff. 15:10 Launchpad bug 1349478 in Ubuntu system image "/usr/sbin/system-image-dbus:sqlite3.OperationalError:_check_for_update:emit_signal:UpdateAvailableStatus:__init__:__enter__:_cursor" [High,In progress] https://launchpad.net/bugs/1349478 15:10 debuntu: zope stack (all squared away now!). gunicorn py3 package support (debian bug #756057). testing pyvenv/virtualenv from trusty PPA. 15:10 Debian bug 756057 in gunicorn "gunicorn: Support Python 3" [Normal,Open] http://bugs.debian.org/756057 15:10 jodh: and it didn't suffly much, did it?! 15:10 other: dealing w/various hardware issues. other discussions on various other topics. 15:10 .. 15:11 * upstart 15:11 jodh: =) 15:11 *** upstart 1.13.1-0ubuntu2 upload to fix cgmanager tests. 15:11 * system-image 15:11 * Spent most of the week learning about how system image builds 15:11 (ongoing :) 15:11 * Working with mvo, cjwatson and stgraber on setting up new images. 15:11 Γ 15:11 barry: gunicorn python3 \o/ 15:11 the all prepared robru is next =) 15:11 xnox: yep, have github pull req pending for debian maintainer. should hopefully land soonish 15:11 * landings, landings, landings, landings! 15:11 * fixed up qtcompositor landing in ubuntu-system-settings 15:11 * learned all about Jasmine unittesting framework for JS -- very nice! 15:11 * wrote vast amounts of unit test code for both CI Train Silo Dashboard and NFSS Web UI 15:11 * attended GUADEC -- saw many talks and took lots of notes 15:11 * Several CI Vanguard shifts 15:11 * stopped some bits of CI Train from hard-coding spreadsheet column indexes, allowing greater 15:11 flexibility to change the spreadsheet in the future (in preparation for RTM) 15:11 * vast simplification of CI Train spreadsheet, dropping all stupid "silo tabs" and streamlinin 15:11 g the testing:done? setting into the pending tab, which will make the spreadsheet much easier 15:11 to navigate when RTM doubles the number of silos we have. 15:11 * various minor pre-RTM preparatory fixes & changes in CI Train 15:11 * 99 commits against the CI Train silo dashboard page. yikes! (many many many small tweaks and 15:11 iterations, and increases in test coverage). 15:12 awesome! 15:12 robru: sil2100: has ci-train been tested / dry-run against rtm on dogfood? 15:13 xnox: doing it all the time ;) 15:13 sil2100: awesome. 15:13 sil2100: you are next! =) 15:13 * slangasek waves belatedly 15:13 xnox: just got blocked by some firewall blockings, but now I'm unblocked again, webops helped out 15:13 xnox, sil2100 has taken charge of the ci train core. I'm mostly working on the periphery, like the dashboard and queuebot. 15:13 #chair slangasek 15:13 Current chairs: slangasek xnox 15:13 o/ 15:13 - Landing team work, landing e-mails, landing coordination - standard stuff 15:13 - Pushing on promotion and TRAINCON-0 issues 15:13 - CI Train maintenance and features: 15:13 * Decoupling prod from preprod, now for testing preprod custom branches can be used 15:13 * Small enablements related to the modified spreadsheet 15:13 * Continuing work on enabling other distributions (ubuntu-rtm) 15:13 - More firewall holes needed in our prodstack instance 15:13 - More complex silo configuration handling 15:14 - Many many corner cases where ubuntu was still considered instead of selected distro 15:14 * Test landing of indicator-location to ubuntu-rtm (in progress) 15:14 * Prepared branch with 'retry failed jobs' 15:14 * Add additional twin upload projects 15:14 * No time to finish work on auto-merge-clean 15:14 - Work on defining TRAINCON-0 formal rules 15:14 - Packaging advice for some upstream developers 15:14 - Documenting the TRAINCON-0 incident 15:14 - Playing around with some hardware 15:14 done 15:14 this covers about 2 weeks for me since I was on holiday last thursday 15:14 updated errors for assets.ubuntu.com to r486 15:14 testing of errors frontend change to filter on pkg_arch 15:14 submitted RT to have errors frontends updated and errors_static_url modified 15:15 setup logrotation for the daisy, errors frontends and pushed to the daisy and errors charms 15:15 updated daisy retracer to keep core dumps when retracing and save crash files in certain situations 15:15 submitted rt to have retracers updated to r504 15:15 updated RT 72977 regarding errors log rotation 15:15 submitted RT 73492 regarding updating the daisy frontends to r498 15:15 updated daisy bucket code to pass architecture to bucketversionscount 15:15 push bzr changes for daisy to increment arch counters 15:15 created some armhf retrace success failures graphs in graphite 15:15 submitted RT to add more retracers for errors 15:15 pinged a webop to run import-user-packages cron job (they said it worked but still nothing in the ColumnFamily) 15:15 manually ran import_user_packages to the temp DSE ring database 15:15 research into recoverable problem bucket grouping upstart and url-dispatcher issues 15:15 discovered and reported apport bug 1349579 15:15 bug 1349579 in apport (Ubuntu) "whoopsie-upload-all uses an incorrect assumption regarding what to upload" [Undecided,Fix released] https://launchpad.net/bugs/1349579 15:15 submitted merge proposal fixing apport bug 1329520 15:15 bug 1329520 in apport (Ubuntu) "whoopsie-upload-all crashes while processing crash file" [High,Fix released] https://launchpad.net/bugs/1329520 15:15 investigation into SystemImageInfo not appearing in apport .crash files on the phone 15:15 uploaded new version of apport to utopic which will properly gather SystemImageInfo 15:15 uploaded new version of whoopsie to upotic that will send SystemImageInfo to errors 15:15 tested whoopsie bug 1320988 regarding online / offline connectivity 15:15 bug 1320988 in whoopsie (Ubuntu) "whoopsie did not become on-line after connecting to wifi" [High,Confirmed] https://launchpad.net/bugs/1320988 15:15 investigation into whoopsie bug 1340604 15:15 bug 1340604 in whoopsie (Ubuntu) "[phone] crash files are only uploaded on boot when not running in the foreground" [Undecided,New] https://launchpad.net/bugs/1340604 15:15 reported apport bug 1347009 regarding retraced crashes missing stacktrace 15:15 bug 1347009 in Daisy "apport-retrace occassionally creates a retraced report without a stacktrace" [Low,Triaged] https://launchpad.net/bugs/1347009 15:15 research into duplicates for ubuntu-release-upgrader bug 1347721 15:15 bug 1347721 in apt (Ubuntu Trusty) "Saucy -> Trusty upgrade failed: procps fails to configure" [High,Triaged] https://launchpad.net/bugs/1347721 15:15 ✔ done 15:16 - GCC default set to 4.9 15:16 - update of cross toolchains 15:16 - openjdk-7 mentoring and fixes 15:16 - clean up component mismatches, dep-waits, ftbfs in main, three days of nagging and fixing 15:16 - packaging review of some third party software 15:16 - updated tightvnc, updated tigervnc (at least it built for arm64 and ppc64el) 15:16 - twisted transition (ubuntu-sso-client still unfixed) 15:16 - arm64 toolchain discussion 15:16 (done) 15:17 Was on vacation last week. 15:17 E-mail and IRC catchup on Monday mostly. 15:17 Helped with some code reviews, discussions, ... wrt ubuntu core system-image. 15:17 Got an initial system-image published for ubuntu-core. 15:17 Discussed partitioning plan for new touch devices. 15:17 Fixed some LXC CI issues. 15:17 Recorded a video on running GUI apps inside LXC: https://www.youtube.com/watch?v=QYsj9LEqxXk 15:17 Poked some more at running Unity8 inside LXC. 15:17 Some more LXC-related discussions (conference planning, ...) 15:17 Discussed some of our NetworkManager patches. 15:17 Fixed a couple of configuration issues with the ISO tracker related to alpha-2, 14.04.1 and 12.04.5. 15:17 (DONE) 15:19 slangasek: your turn =) and take over chairing =) 15:19 hmm :) 15:20 * working with bhuey on prepping openjdk security update 15:20 * discussions around the 'init' package and systemd-sysv to unblock new images using systemd from the start 15:20 * tracking crash retracing success on the phones; working with bdmurray et al. to get any blocking issues fixed in advance of RTM 15:20 * performance reviews 15:20 * helped move forward some HWE SRUs related to a server engagement 15:20 * helped with getting out of TRAINCON-0 on Monday 15:21 * next week: joining a cloud team sprint (as is Colin), so expect limited availability 15:21 * the week after: on vacation 15:22 bdmurray: on the errors.u.c side, when do you expect we'll have the per-image view available? 15:22 slangasek: what was the final answer for the counter? 15:23 bdmurray: ah, in scrollback, let's iron that out after the meeting? 15:23 slangasek: okay, it shouldn't take too long to add 15:23 any other questions over status? 15:24 [TOPIC] Upstart cgroup support 15:25 forgot to do an in-depth topic for last week's meeting... remembered this week :) 15:25 Thanks Steve. Everyone sitting comfortably? 15:25 so jodh will talk a bit about the work he did to implement cgroup support into upstart 15:25 Today, I'm going to give a brief [1] talk about cgroup support in upstart 15:25 and some of the challenges we faced. This may go some way in explaining 15:25 the seemingly never-ending upstart async branch updates I've given in 15:25 the past in this meeting :-) 15:25 = Intro = 15:25 As of version 1.13, Upstart supports cgroups. By "support", I mean "has 15:25 the ability to place job processes into one or more cgroups" (for 15:25 service resource control). It does _not_ mean that Upstart uses cgroups 15:25 to mop up the mess if a service ends badly (process supervision). 15:25 = The cgroup Stanza = 15:26 After thrashing out the design with stgraber, slangasek and hallyn 15:26 (http://upstart.ubuntu.com/wiki/Cgroup), we added support to parse a new 15:26 "cgroup" stanza that a job can specify. The final syntax is extremely 15:26 clean and praise goes to stgraber for realising how simple we could make 15:26 it!) Here's a summary of the behaviour: 15:26 - If not specified, the job processes are not placed into (any new) 15:26 cgroups. 15:26 - If a job specifies a cgroup stanza, that job cannot legitimately start 15:26 until the cgroup manager itself is running. To handle this, we added a new 15:26 initctl command ("notify-cgroup-manager-address") which the 15:26 cgmanager.conf job itself calls in post-start to notify upstart where 15:26 to find cgmanager :-) 15:26 - If specified, *all* job processes are put into the specified cgroup(s). 15:26 - If specified as "cgroup " ("cgroup cpuset" for example), 15:26 Upstart will add the job to a job-specific cgroup whose value will be 15:26 "$UPSTART_JOB-$UPSTART_INSTANCE". 15:26 - If specified as "cgroup cpuset foo 12", Upstart will place the job 15:26 processes into the implicit job-specific cpuset cgroup and set 15:26 "foo=bar" in that group. 15:26 - If specified as "cgroup cpuset hello foo 12", Upstart will place the 15:26 job into a the cpuset cgroup called "hello" and set "foo=12" in that 15:26 group. If "hello" does not exist, it will be created. This allows 15:26 multiple different jobs to enter the same cgroup if desired. 15:26 - The cgroup name ("hello" in the example above) can also contain variables: 15:26 "cgroup cpuset db/$foo/$bar-$baz". 15:26 - You can also get at the cgroup that Upstart would create on behalf of 15:26 the job using the magic $UPSTART_CGROUP variable (note that this is 15:26 _not_ an environment variable and is only valid within a cgroup 15:26 stanza). For example: "cgroup cpuset db/$UPSTART_CGROUP". 15:26 So far so good. 15:27 Since there is already an excellent cgroup manager available, and since 15:27 we try to avoid adding extra complexity to PID 1, we opted to avoid 15:27 * slangasek waits for the footnote to resolve 15:27 re-inventing the wheel by having cgmanager(8) handle the actual cgroup 15:27 operations. So, when Upstart starts a job that specified a cgroup 15:27 stanza, it needs to do the following: 15:27 - Connect to cgmanager. 15:27 - Ask cgmanager to create the cgroup(s). 15:27 - Ask cgmanager to move the specified process(es) into a cgroup. 15:27 - Ask cgmanager to set apply a particular setting to a cgroup. 15:27 However, there's a problem with the above. What if cgmanager hung? 15:27 We'll come back to this, but first I need to explain how Upstart spawns 15:27 a job. 15:27 = Async spawning = 15:27 == Historical synchronous spawning == 15:27 Upstart used to do the following when wishing to start a new job process: 15:27 1) Create a pipe. 15:27 2) fork itself. 15:27 3) Have the child do all necessary setup such as dropping privileges, 15:27 closing fds, switching apparmor profiles, etc. Then: 15:27 - If the child finished its setup successfully it simply exec'd the 15:27 relevant program specified in the job .conf file for the job process 15:27 in question). 15:27 - But if the setup failed, the child wrote a status message back up to 15:27 the parent (PID 1) explaining what went wrong, and then exited. 15:27 4) All the time the child wass doing setup, PID 1 was doing a _blocking 15:27 read_ on its end of the pipe. This implies that no operation in the 15:27 child setup phase could block, since if it could block, it would also 15:27 block PID 1, and thus DoS the system. 15:27 As a result, we couldn't call cgmanager from PID 1 directly, since that 15:27 could lead to a DoS, but we couldn't call it from the child _either_. 15:28 == Brave New World == 15:28 The solution was to change the way in which Upstart spawns. In the new 15:28 world, Upstart still creates the pipe but doesn't do a blocking read; it 15:28 just adds the fd for the read end of the pipe to a queue and waits for 15:28 some notification from the child. So we now have asynchronous child 15:28 spawning. To achieve this, we had to increase the number of states the 15:28 job can be, since there is now a distinction between: 15:28 - "the job process has been spawned successfully" 15:28 - "the job process is _being_ spawned" (but we don't know what the outcome is yet). 15:28 Here is the old state transition diagram: 15:28 http://people.canonical.com/~jhunt/upstart/upstart-states-old.png 15:28 And here's the new one: 15:28 http://people.canonical.com/~jhunt/upstart/upstart-states-new.png 15:28 However, that only solved half the problem - since Upstart was now 15:28 spawning asynchronously, the design meant that the order in which child 15:28 notifications could arrive became non-deterministic since either of the 15:28 following could happen "first": 15:28 - child exits and Upstart is notified via waitid(). 15:28 - child pipe closes or has data written to it and is notified by select(). 15:28 This needed careful handling since *both* those operations could update 15:28 the job state, but we didn't want the state to be "double-bumped". 15:28 In summary, adding the new async spawning feature was quite a challenge 15:28 with xnox and I gaining a few grey hairs in the process (his don't show! 15:28 Errm... ;-) 15:28 = Test Suite = 15:28 Since the new states and the new async nature of spawning meant that a 15:28 large chunk of the (large!) set of Upstart tests suddently broke. Hence, 15:28 it took lots of careful reviewing of both the code and all the required 15:28 test changes to resolve this new feature. 15:28 = Stateful re-exec = 15:29 This needed updating to handle the new cgroup stanza data. But we also 15:29 needed to consider scenarios like this: 15:29 - PID 1 starts a job process asynchronously. 15:29 - child takes "a long time" to setup. 15:29 - PID 1 is restarted. 15:29 Post-re-exec, PID 1 needs to know to keep track of the outstanding child 15:29 setup operations it is (asynchronously) waiting on. To handle this, we 15:29 added a new JobProcessData object to store the transitory child setup 15:29 meta-data (which gets discarded once the child has either died or 15:29 responded down the pipe). 15:29 = Cgroup Operations = 15:29 With the advent of async spawning, the child now makes all necessary 15:29 calls on the cgmanager with PID 1 being completely immune to any issues 15:29 that that may entail. In fact, aside from storing the parsed cgroup 15:29 stanza data, all PID 1 does is store the address of cgmanager! 15:29 = Conclusion = 15:29 The final result is an extremely clean and safe design. By introducing 15:29 async spawning we were also able to make Upstart fully immune to the 15:29 child blocking PID 1 (there used to be a couple of areas that 15:29 theoretically could cause issues on a mis-configured system). 15:29 --- 15:29 [1] - FTR, I'm using ev's definition of 'brief' :-) 15:29 * jodh grumbles over whitespace damage.... 15:29 A non-garbled version: http://paste.ubuntu.com/7915372/ 15:29 heh 15:31 jodh: so, http://people.canonical.com/~jhunt/upstart/upstart-states-new.png is the state diagram for what's implemented now in 1.13? 15:31 slangasek: yep - I need to refresh the cookbook with that. 15:32 (also pretified the graph in graphviz for vertical top/down layout) 15:32 vs previously "optimal" graph 15:32 xnox: yeah, that's much improved, thanks! :) 15:32 still just the single error path, though; couldn't you have made it more complicated? ;) 15:33 jodh: very interesting! is there a timeout after which the child is just considered hung, and does upstart do anything about that state? 15:34 barry: no - no timeout. 15:34 barry: it just gets stuck in e.g. "starting/pre-starting" state. 15:34 barry: if the child hangs, the state will reflect that if you run 'initctl status $job'. 15:34 (or some such, can't remember exact names of the spawning state" 15:34 ) 15:35 gotcha 15:35 * slangasek nods 15:35 if the job never gets around to starting, it's not init's job to fix it ;) 15:36 jodh: do we have people using the cgroup support in anger yet? 15:36 barry: If a job does hang though, you may get something useful in cgmanagers log (assuming you've set cgmanager_opts= in /etc/init/cgmanager.conf). 15:37 slangasek: I don't think so actually. I checked on a recent touch image and I can't see any evidence of it being used yet. We need to poke ted! :) 15:37 jodh: has ted been poked about this yet? 15:37 if not, then yes, yes you do ;) 15:38 slangasek: not by me directly. I added him to https://code.launchpad.net/~jamesodhunt/ubuntu/utopic/cgmanager/enable-upstart-cgroup-support/+merge/227209 so he should be aware of it. 15:39 slangasek: i'll poke ted about it. 15:39 xnox: ok, thanks 15:39 let's all poke ted. err... 15:40 any other questions about cgroups in upstart? 15:42 jodh: thanks for presenting! 15:42 [TOPIC] AOB 15:42 slangasek: I was thinking maybe you and stgraber could discuss bug 1314616? 15:42 bug 1314616 in bitcoin (Ubuntu) "[SRU] bitcoin to be maintained upstream in PPA: Replace distro archive "bitcoin" bitcoin with an empty dummy package" [Undecided,Confirmed] https://launchpad.net/bugs/1314616 15:42 oh no 15:43 oh, that again? 15:43 stgraber: I had explicitly told them to submit an SRU to disable the daemon on upgrade 15:43 stgraber: and you are apparently not happy with the proposed solution 15:43 slangasek: I'm not? 15:43 stgraber: so yes, we should talk, but probably not during the meeting :-) 15:43 stgraber: that's what I heard! 15:44 there was an email to ubuntu-devel about it earlier this month 15:44 anyway, maybe we talk on #ubuntu-devel after the meeting? 15:44 slangasek: I think I was just unhappy that the reporter tried to get us to do things without going through the proper SRU process 15:44 ah :-) 15:45 slangasek: I don't care about the package itself and am perfectly happy to have it die one way or another :) 15:45 ok then! 15:45 anything else to discuss on this fine summer day? 15:46 I mostly complained to the reporter when he started nagging me as the current patch pilot to do something which needed discussion with the SRU team. Now that the SRU team has clearly been informed of it, someone should just sponsor a debdiff and be done with it. 15:46 * slangasek nods 15:47 #endmeeting