16:18 #startmeeting Cloud-init bi-weekly status 16:18 Meeting started Mon Jun 24 16:18:34 2019 UTC. The chair is blackboxsw. Information about MeetBot at http://wiki.ubuntu.com/meetingology. 16:18 16:18 Available commands: action commands idea info link nick 16:19 welcome to another episode of cloud-init status updates. 16:20 Cloud-init upstream uses this meeting as a platform for community updates, feature/bug discussions, and an opportunity to get some extra input on current development. 16:21 our format is the following topics: Previous Actions, Recent Changes, In-progress Development, Office Hours 16:21 anyone is welcome to participate, interject, make suggestions or ask questions 16:22 generally we try to host this meeting every two weeks on the day listed in the channel topic 16:23 #topic Previous Actions 16:23 last meeting 16:23 #link https://cloud-init.github.io/status-2019-06-10.html#status-2019-06-10 16:24 we had an action to follow up on any bugs related to installing ifupdown on a system that had netplan installed by default. 16:24 I believe we did see a bug come in from Azure about that.... checking for that bug id now 16:25 #1832381 16:25 bug #1832381 16:25 bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete] https://launchpad.net/bugs/1832381 16:25 #link https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381 16:25 There is an action item on me to attach a log to that bug. Since the incident created by the customer was closed and we did not have permission to share his log, I will need to get a repro and retrieve the log. It's not very easy to trigger a mac address change in Azure these days 16:25 thanks AnhVoMSFT for this bug 16:27 ok if we carry over that action item then for next status meeting AnhVoMSFT (just to close the loop if it's important) 16:27 yep - once I get some help from our networking folks to trigger a mac address change I'll update the bug with more logs 16:27 #action Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381 16:27 * meetingology Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381 16:27 Ubuntu bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete] 16:28 good deal. that's all we had for actions from last meeting 16:28 #topic Recent Changes 16:29 the following items have landed on tip of cloud-init's master branch 16:30 - sysconfig: support more bonding options [Penghui Liao] 16:30 - cloud-init-generator: use libexec path to ds-identify on redhat systems 16:30 [Ryan Harper] ([LP: #1833264](https://bugs.launchpad.net/bugs/1833264)) 16:30 - tools/build-on-freebsd: update to python3 [Gonéri Le Bouder] 16:30 Ubuntu bug 1833264 in cloud-init "cloud-init-generator hardcodes path to ds-identify" [Undecided,Fix committed] 16:30 thanks to Penghui and Gonéri for driving additional changes for cloud-init in this last sessions 16:30 session* 16:31 #topic In Progress Development 16:32 there a number of longer items for feature work in progress that should see some light soon 16:33 We track these features in trello as always 16:33 #link https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin 16:33 minor fixup for Azure instance-data.json (cloud-init query) for region and availability zone should land today 16:34 rharper: and blackboxsw are working on Azure-related route tables and async disk mount features 16:36 is there any bug/discussion item for the async disk mount? 16:37 AnhVoMSFT: rharper has been testing out systemd unit magic for setting up disk mounts async and initial numbers look good. How to bake that work into cloud-init is the next small hurdle I think. (I thought he mentioned today in our standup a 50% speed increase due to async mounts instead of sync waits) 16:38 https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+ref/feature/disk_setup_async is the WIP branch 16:39 AnhVoMSFT: I expect we'll have something in the next couple of days. 16:39 orrr right now. thanx rharper 16:39 that sounds really cool. I'll check it out 16:39 * blackboxsw creates a trello card that can be watched for this feature 16:40 #link https://trello.com/c/TMK5ZDMf/1108-azure-async-disk-mounts 16:41 feel free to subscribe to any trello cards folks see that are of interest. you will get an email if the card changes state, like from Doing to Done or if new links are added 16:42 Odd_Bloke: rharper process question 16:42 what do you guys think about us turning on voting on trello cards 16:42 people with interest on a feature/card in our backlog could upvote it and that *could* help drive what features we grab over time 16:43 dunno, thought it might be something we could toss around to see if that would make sense. the board it public after all 16:43 *is public* rather 16:43 maybe; I worry about random +1 without any more context. Platform developers already work with us; and community folks file bugs/merge proposals 16:44 good point. 16:44 I'm open to the idea 16:45 for sure, if it gets interest, we can think about adding that feature. can't hurt to have some additional input, unfounded though it may be. 16:45 agreed on the usefulness might be limited. You guys are already talking to each other. Platform developers either engage directly on this board or through out of band channel (sync meeting with Canonical product groups, etc...) 16:46 Perhaps you can try it out for a couple release periods and see how it works out 16:46 yeah, /me just likes all the shiny objects pretty icons ;) ... need to control myself 16:47 thx AnhVoMSFT +1. 16:48 so I think that about wraps in-progress development. I know paride has been tirelessly working on our CI infrastructure to improve quality of CI and false positives for failures due to resource constraints. So big thanks for paride working on our jenkins workers 16:48 #topic Office Hours (next ~30 mins) 16:49 This is an open topic to bring any cloud-init discussions, bugs, concerns or feature requests folks have. 16:49 In the absence of such topics we spend part of this time grooming the review queue to get back to dev 16:50 contributors so that they don't have stale branches waiting for input 16:50 We had a review sent out to add some boot time telemetry collection as part of cloud-init analyze: https://code.launchpad.net/~samgilson/cloud-init/+git/cloud-init/+merge/368943 16:50 thanks AnhVoMSFT I'll grab a review slot on that one now 16:51 would appreciate some reviews there and also on ideas on how to retrieve similar timestamps for FreeBSD 16:51 AnhVoMSFT: yes, will review 16:51 AnhVoMSFT: also, I filed a bug related to the azure telemetry, lemme get it 16:51 I'll kick off a CI run on that now 16:51 rharper: ^ 16:52 Bug 1833731 16:52 bug 1833731 in cloud-init "cloud-init analyze output not formatted cleanly on Azure" [Undecided,New] https://launchpad.net/bugs/1833731 16:52 AnhVoMSFT: not sure if the branch for review addresses the formatting of the output, but we should take a look to clean it up 16:52 is there a good way to subscribe to new bugs with a certain keyword/tags? I.e., I would like to auto-subscribe to all bugs that has "Azure" in the bug title 16:52 rharper: if you get a chance to double check https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/369199 we might be able to land that too 16:53 blackboxsw: I asked you some questions, if you've replied, I'll look again 16:53 rharper: nevermind, I see you already looked at it 16:53 ah 16:53 perfect 16:53 thanks 16:53 I think we're mostly fine; just a question on return values 16:53 * blackboxsw needed to refresh 16:54 rharper I will take a look at the analyze output and see how we can improve it. If it is a minor change we can add it to the existing review 16:54 AnhVoMSFT: no need to pull it into the existing stuff 16:55 I'd prefer a separate targetted fix; which may land independently from the boot stage (which is super interesting on its own) 16:55 cool - we will do a separate fix then 16:55 rharper: yeah that concern is clear, we are safe on the processing of region/az in absence of 'imds' key 16:55 it'll return none 16:57 by virtue of that last get('location|platformFaultDomain') 16:57 if either is absent due to any key above being absent, you'll get None as default value 16:58 blackboxsw: ack 17:00 Hi there, not sure if this is the right place to ask, but I have problems when creating a new VM, it only happens with the debian cloud image, ubuntu is fine. Booting is stuck at the drm line, the exact line is dependent on the video model type in my libvirt xml but it is basically stuck for 20-30sec and won't continue. It will boot eventually after that time. Thanks so much for any hints. Happy to provide 17:00 further details. 17:02 hrm, video model timeouts are a bit out of my wheelhouse :/ 17:02 * blackboxsw pokes around a bit in google 17:02 it seems to be that the lines after it would be about resizing the file system. I am not really sure if this is cloud-init related at all and I am not sure if it actually is caused by the video model or is just taking a bit to get to the next steps 17:03 nik736: you can run cloud-init analyze show or cloud-init analyze blame to see what cloud-init says it is spending a lot of time on 17:03 I tried different host systems, Debian 18.04, 19.04, Debian 9, different libvirt versions, different qemu versions, nothing seems to be helping lol 17:03 (If you have cloud-init v 18 or later in your image I think) 17:04 ah, ok, thanks, I will look into that 17:04 nik736: also systemd-analyze blame is a good helper for what is killing boot time 17:04 do you see any timestamp gap that reflects the 20-30s in cloud-init.log ? 17:08 nik736: feel free to file a bug and attach logs from the 'cloud-init collect-logs' output (or serial console if available) and /var/log/cloud-init.log if you can get into the instance afterwards 17:08 thanks for the help, currently looking into it 17:11 rharper blackboxsw we have some instance deployment where cloud-init is hanging at the command ip route add - any idea how to look further? 17:12 this does look like a platform problem, so it is more of a question related to networking, rather than cloud-init itself 17:14 it's super hard to reproduce so the only thing we have so far to work with is logs. I thought the call to ip route add basically adds an entry to the kernel routing table. Is there an interaction with networking involved which might cause it to hang? 17:14 AnhVoMSFT: I wonder if it's creating a route that breaks connection to IMDS or something else that cloud-init would then do an HTTP get on ? 17:17 I saw in the log that 2 entries are around 1 minute apart "SUCCESS: searching for local datasources" and "Cloud-init v. 0.7.9 running 'init' at Mon, 24 Jun 2019 17:13:41 +0000. Up 73.67 seconds." I am not sure if this could be it or if this looks fine 17:18 0.7.9 is quite old, seeing the full cloud-init.log will be most useful for us to understand what's happening 17:19 okay, sec 17:19 rharper that is a good theory. I do see in a good case there's a call to IMDS immediately after that, although that call has a timeout. If it fails we should see more logs coming out of cloud-init. I'll look further into that todya 17:20 @rharper https://pastebin.com/fzCSH5kC 17:20 AnhVoMSFT: the retry logic in DataSourceAzure is quite long IIRC, so it's quite possible this is the very issue that blackboxsw is working w.r.t ensuring the instances always have a source-ip route to the IMDS 17:21 rharper indeed it is long, and the log was overly suppressed to avoid log from growing too large while VM was waiting in pre-provisioning state. We are adding back some of the logs (in a smarter way to get enough details while avoiding huge log size) 17:22 nik736: so, between line 260 and 261 there's a large timedelta; that's *outside* of cloud-init; cloud-init is executed separately 4 times (cloud-init init --local, cloud-init init, cloud-init config --modules, cloud-init config --final) 17:23 nik736: so if you have a systemd journal, we could see what happens between the end of cloud-init-local.service and cloud-init.service (stage1 and 2); 17:23 ah, okay, interesting 17:23 will check 17:23 or syslog might see stuff between those two time points 17:23 * rharper steps away for a bit, please keep sending info here; I'll respond when I;m back 17:24 nik736 systemd-analyze critical-chain cloud-init.service might help here - I think some systemd service is running right after init-local and just before init and that service is taking time 17:25 will check, thanks for your help, really appreciate it. 17:32 I think I'll wrap the meeting here, but we can continue the conversation. Thanks again folks for the discussions 17:33 next meeting will be July 8th 17:33 as updated in the topic 17:33 meeting minutes will be posted to 17:33 #link https://cloud-init.github.io 17:33 #endmeeting