WEBVTT Kind: captions Language: en-US 00:00:01.617 --> 00:00:04.000 [silence] 00:00:04.000 --> 00:00:07.520 Hello, and thank you for the opportunity to talk to you about the 00:00:07.520 --> 00:00:11.280 work we’ve been doing on deep learning for earthquake monitoring. 00:00:11.280 --> 00:00:15.576 My name is Greg Beroza. I’m at Stanford University. 00:00:15.600 --> 00:00:19.760 The funding sources for this work are shown on the left, 00:00:19.760 --> 00:00:24.640 and the participants in it are shown on the right. 00:00:24.640 --> 00:00:29.256 Initially, all at Stanford, but they’re starting to scatter to the forewinds. 00:00:29.280 --> 00:00:34.400 So this figure shows schematically what is involved in earthquake monitoring. 00:00:34.400 --> 00:00:38.480 We have our input, which is ground motion from a seismic network. 00:00:38.480 --> 00:00:41.736 We don’t know ahead of time when the earthquakes are going to happen, 00:00:41.760 --> 00:00:47.360 so we have to work with that data to detect the events, to pick the phases, 00:00:47.360 --> 00:00:51.840 to associate those phases with events, then to locate and characterize the 00:00:51.840 --> 00:00:55.736 earthquake and – earthquakes. And then, the output from that 00:00:55.760 --> 00:01:02.400 is a comprehensive catalog of discrete seismic events – 00:01:02.400 --> 00:01:05.736 earthquakes, quarry blasts, whatever. 00:01:05.760 --> 00:01:09.360 My group has been working systematically to develop machine 00:01:09.360 --> 00:01:17.336 learning-based approaches to plug into this workflow. 00:01:17.360 --> 00:01:20.320 I’ll talk a bit about PhaseNet, which is a machine learning-based 00:01:20.320 --> 00:01:24.960 method for picking seismic phases. I’m not going to talk about 00:01:24.960 --> 00:01:29.336 DeepDenoiser, but that’s a method for separating signal 00:01:29.360 --> 00:01:35.016 from noise, such that we get better catalogs. 00:01:35.040 --> 00:01:38.560 We’ve developed a deep Bayesian neural network for single-station 00:01:38.560 --> 00:01:40.960 earthquake location. This is relevant when we have 00:01:40.960 --> 00:01:45.840 sparse networks or when the earthquakes are small and difficult 00:01:45.840 --> 00:01:50.000 to detect on multiple stations. Same goes for magnitude 00:01:50.000 --> 00:01:53.495 determination. We have a single-station 00:01:53.520 --> 00:01:58.296 magnitude determination neural network that will allow that. 00:01:58.320 --> 00:02:04.880 And our sort of second-generation earthquake detection network – 00:02:04.880 --> 00:02:07.520 I’ll talk a bit about that. It’s a multi-task network in that 00:02:07.520 --> 00:02:11.200 it does both event detection and phase picking. 00:02:11.200 --> 00:02:17.256 And, because it does those together, it does better at both. 00:02:17.280 --> 00:02:21.840 So mention that deep learning is data-hungry. 00:02:21.840 --> 00:02:26.880 And what I mean by that is that it requires a lot of data – many examples 00:02:26.880 --> 00:02:31.920 for a neural network to learn to, say, pick seismic phases in a – 00:02:31.920 --> 00:02:36.240 in a way that’s reliable and that can work generally. 00:02:36.240 --> 00:02:40.640 Now, seismology is very fortunate in that we have – through the hard work 00:02:40.640 --> 00:02:46.720 of analysts, we have very large data sets. These data sets are labeled already, 00:02:46.720 --> 00:02:51.600 which is fantastic. But the labels are not perfect. 00:02:51.600 --> 00:02:55.520 That is, there are lots of errors and omissions in the catalog. 00:02:55.520 --> 00:02:59.120 And these have the potential to undermine our deep learning-based 00:02:59.120 --> 00:03:02.856 approaches if we don’t eliminate to the extent that we can. 00:03:02.880 --> 00:03:06.960 So, some issues with earthquake catalogs are sometimes there are 00:03:06.960 --> 00:03:10.320 earthquakes that aren’t really there. And, hence, we would be teaching 00:03:10.320 --> 00:03:13.280 a neural network the wrong thing. This is probably not too much 00:03:13.280 --> 00:03:18.320 of a problem. Much more insidious is the fact that we have extra earthquakes. 00:03:18.320 --> 00:03:23.360 So shown here is our waveforms – three-component waveforms, 00:03:23.360 --> 00:03:26.536 P wave and S wave picks. So an earthquake detection there. 00:03:26.560 --> 00:03:29.280 Here’s an earthquake – a smaller earthquake aftershock 00:03:29.280 --> 00:03:31.720 that’s not in [audio cuts out]. 00:03:31.720 --> 00:03:38.080 And, if we were to not have a label for that, or to label it as noise, our network 00:03:38.080 --> 00:03:40.880 would learn exactly the wrong thing. That is, our neural network, 00:03:40.880 --> 00:03:43.120 not our seismic network. So we absolutely have to 00:03:43.120 --> 00:03:47.920 get rid of these extra earthquakes or label them as earthquakes. 00:03:47.920 --> 00:03:51.280 And then, as all seismologists know, that there are errors in picks. 00:03:51.280 --> 00:03:53.920 And so, if we’re trying to pick arrival times, 00:03:53.920 --> 00:03:58.000 we have to deal with those – with those errors. 00:03:58.000 --> 00:04:03.600 So, in order to get a very high-quality data set, we curated a data set. 00:04:03.600 --> 00:04:06.960 That is, we collected many millions of waveforms. 00:04:06.960 --> 00:04:10.880 We did quality control on them, and called this – we added labels. 00:04:10.880 --> 00:04:14.480 We called this the Stanford Earthquake Data Set, or STEAD – 00:04:14.480 --> 00:04:19.040 1.2 million seismograms. So it’s not a gigantic data set. 00:04:19.040 --> 00:04:23.200 But it comes from 500,000 earthquakes recorded at different depths and 00:04:23.200 --> 00:04:28.536 around the world, as shown here, in a variety of tectonic settings. 00:04:28.560 --> 00:04:33.760 All these seismographs are – or, seismograms are from local 00:04:33.760 --> 00:04:38.720 distances only – less than a couple hundred kilometers. 00:04:38.720 --> 00:04:43.680 And the reason for that is that our initial motivation is to locate small local 00:04:43.680 --> 00:04:48.000 earthquakes because these are the ones most likely to be missing from catalogs. 00:04:48.000 --> 00:04:55.600 So this STEAD waveform database is collected from a range of 00:04:55.600 --> 00:04:57.920 different seismometers, different kinds of sensors. 00:04:57.920 --> 00:05:01.336 We have examples of signal and of noise. 00:05:01.360 --> 00:05:05.176 We’ve added many labels, like coda duration. 00:05:05.200 --> 00:05:09.040 And I mentioned the quality control. And we’ve also put in an HDF5 format 00:05:09.040 --> 00:05:14.720 for easy I/O and to recruit expertise from data scientists who don’t like – 00:05:14.720 --> 00:05:16.880 you know, who shouldn’t be burdened with learning 00:05:16.880 --> 00:05:20.560 our seismological exchange formats and so forth. 00:05:20.560 --> 00:05:26.720 So first, I’ll talk briefly about PhaseNet, which was developed using data from 00:05:26.720 --> 00:05:31.200 the northern California seismic network. PhaseNet – the input is 00:05:31.200 --> 00:05:35.520 three-component seismic data. The output is pics of P waves 00:05:35.520 --> 00:05:39.520 and picks of S waves. So that probability – the green – 00:05:39.520 --> 00:05:43.840 the green here is a truncated Gaussian centered on the 00:05:43.840 --> 00:05:49.120 P wave arrival time in the catalog. This one is centered on the 00:05:49.120 --> 00:05:54.640 S wave arrival time. And the idea is that we engineer it such that it picks the 00:05:54.640 --> 00:06:00.905 probability peaks on the P wave arrival time and on the S wave arrival time. 00:06:01.680 --> 00:06:07.600 And it works. This just shows examples from different kinds of data. 00:06:07.600 --> 00:06:12.640 Nice data, clipped data, noisy data. There’s a lot of each kind of these 00:06:12.640 --> 00:06:18.080 situations in the – in the data set. So it learns, just like a seismologist 00:06:18.080 --> 00:06:24.456 would, to make the pick despite challenges that it might encounter. 00:06:24.480 --> 00:06:28.160 So it works quite well. So at the top is a histogram 00:06:28.160 --> 00:06:33.816 of PhaseNet for P and S waves. Below for the sort of standard 00:06:33.840 --> 00:06:37.520 non-machine learning-based approach – characteristic function-based approach. 00:06:37.520 --> 00:06:42.800 We can see that the P wave is zero mean with a narrow – with a small variance – 00:06:42.800 --> 00:06:48.080 not much dispersion about that mean. The S wave is not as good as the 00:06:48.080 --> 00:06:50.720 P wave, but it’s dramatically better for the machine learning-based 00:06:50.720 --> 00:06:57.200 catalog than for the – for the sort of standard approach. 00:06:57.200 --> 00:07:01.680 And I should mention that our ground truth here is not exactly truth. 00:07:01.680 --> 00:07:04.960 These are analyst-reviewed picks, which are taken as correct, 00:07:04.960 --> 00:07:08.913 but, of course, themselves can include errors as well. 00:07:09.600 --> 00:07:12.080 Mentioned Earthquake Transformer is our next-generation 00:07:12.080 --> 00:07:15.576 earthquake detection and phase-picking algorithm. 00:07:15.600 --> 00:07:19.680 This is a very deep – meaning many-layered neural network. 00:07:19.680 --> 00:07:22.000 In addition to its depth, there are different kinds of layers. 00:07:22.000 --> 00:07:25.760 So there are convolutional layers, which are meant to characterize 00:07:25.760 --> 00:07:30.240 the characteristics of – characteristic features of waveforms. 00:07:30.240 --> 00:07:33.864 Those are shown in pink. 00:07:34.800 --> 00:07:37.760 There are – shown in yellow, there are recurrent layers. 00:07:37.760 --> 00:07:41.840 These are meant to capture the sequencing [audio cuts out], say, the 00:07:41.840 --> 00:07:48.800 P wave before the S wave to properly represent what we know to be the case. 00:07:48.800 --> 00:07:54.560 And then there are these so-called attention mechanisms that use the – 00:07:54.560 --> 00:07:58.560 that focus the attention of the neural network on the parts that matter for 00:07:58.560 --> 00:08:02.560 detection, P wave arrival picking and S wave arrival picking. 00:08:02.560 --> 00:08:06.960 So there’s seismogram output – or, input and three different outputs. 00:08:06.960 --> 00:08:09.920 One is when the earthquake is, P wave arrival time, and the 00:08:09.920 --> 00:08:13.920 S wave arrival time – the latitude just like PhaseNet. 00:08:13.920 --> 00:08:16.560 It works better than PhaseNet. And the reason it works better 00:08:16.560 --> 00:08:21.016 is because, like a seismologist, it uses the waveform context 00:08:21.040 --> 00:08:23.640 to pick the P waves and the S waves. 00:08:23.640 --> 00:08:28.056 So it doesn’t just focus on the narrow little P wave arrival. 00:08:28.080 --> 00:08:34.320 It works better, not only than PhaseNet, but it works better than other machine 00:08:34.320 --> 00:08:38.296 learning-based approaches for picking P waves and S waves. 00:08:38.320 --> 00:08:42.000 In addition to PhaseNet, there are four other approaches shown here. 00:08:42.000 --> 00:08:44.640 Same kind of histogram – P waves and S waves. 00:08:44.640 --> 00:08:48.800 And Earthquake Transformer, for now, is the – is the best-performing 00:08:48.800 --> 00:08:53.840 of the – of the lot. So let me show you some results that we’ve gotten 00:08:53.840 --> 00:08:59.576 using PhaseNet – not Earthquake Transformer, but PhaseNet – 00:08:59.600 --> 00:09:06.936 for phase picking as part of this earthquake monitoring workflow. 00:09:06.960 --> 00:09:10.480 We applied it to the Guy-Greenbrier Arkansas earthquake, already 00:09:10.480 --> 00:09:13.840 a well-studied earthquake sequence thought to be induced. 00:09:13.840 --> 00:09:16.400 We developed a complete machine learning-based catalog 00:09:16.400 --> 00:09:21.840 for this using PhaseNet. We did not re-train it. 00:09:21.840 --> 00:09:26.400 That is, this is PhaseNet as trained on NCSN data, yet it generalizes well 00:09:26.400 --> 00:09:29.760 to these smaller, closer, shallower earthquakes that occur in a different 00:09:29.760 --> 00:09:33.200 geologic setting, perhaps by a different mechanism. 00:09:33.200 --> 00:09:38.080 There are 90,000 events in the catalog. There would be hundreds of thousands 00:09:38.080 --> 00:09:44.536 if we had more stations to work with. Our stations are very limited. 00:09:44.560 --> 00:09:48.320 These events, as I said, were thought to be induced – or thought to be induced 00:09:48.320 --> 00:09:53.280 through a combination of hydraulic stimulation and deep disposal wells. 00:09:53.280 --> 00:09:57.280 Clara Yoon showed the importance of hydraulic stimulation 00:09:57.280 --> 00:10:01.120 to the early parts of this sequence. 00:10:01.120 --> 00:10:04.480 So that did we learn? Well, we learned that this is actually 00:10:04.480 --> 00:10:09.520 the superposition of two sequences. So what this just shows – latitude 00:10:09.520 --> 00:10:14.720 versus calendar time, the evolution of seismicity in our catalog. 00:10:14.720 --> 00:10:17.200 You can see these characteristic parabolic shapes. 00:10:17.200 --> 00:10:20.216 You can fit a diffusivity to that. It’s quite high. 00:10:20.240 --> 00:10:24.056 Order of 10 to 100 meters squared per second. 00:10:24.080 --> 00:10:29.440 Also notice that the sequence starts up here, propagates to the south, 00:10:29.440 --> 00:10:33.440 and then sort of putters on, and then it re-nucleates – or, another 00:10:33.440 --> 00:10:37.840 sequence nucleates down here and spreads back towards where the 00:10:37.840 --> 00:10:41.920 sequence began as well as to the south. And it’s the second part of the sequence 00:10:41.920 --> 00:10:46.376 that had the magnitude 4.6 that shut down the – 00:10:46.400 --> 00:10:49.840 shut down the wells – 1 and 5. And yet, this – you know, 00:10:49.840 --> 00:10:56.695 there’s a chance that this Well 2 was involved in triggering this sequence 00:10:56.720 --> 00:10:59.600 because that sequence actually nucleated farther to the south. 00:10:59.600 --> 00:11:02.400 So some interesting new insights into this already 00:11:02.400 --> 00:11:05.336 well-studied earthquake sequence. 00:11:05.360 --> 00:11:10.080 We’ve applied PhaseNet to the Amatrice – well, to the Apennines 00:11:10.080 --> 00:11:16.080 sequence – the 2016-2017 part of that. It’s a complex normal-faulting 00:11:16.080 --> 00:11:21.336 sequence. Killed hundreds of people. Left tens of thousands of people 00:11:21.360 --> 00:11:25.680 homeless in central Italy. And here, again, is a catalog comparison. 00:11:25.680 --> 00:11:30.160 The top four panels show the standard catalog – a very good catalog 00:11:30.160 --> 00:11:36.080 developed by scientists at INGV. Has 82,000 events in it during 00:11:36.080 --> 00:11:39.600 that one-year time period. That’s about one earthquake every six minutes. 00:11:39.600 --> 00:11:45.920 The PhaseNet catalog has over 10 times as many events – over 900,000 events. 00:11:45.920 --> 00:11:49.360 That’s one every 35 seconds. These are all really tiny earthquakes – 00:11:49.360 --> 00:11:53.200 all the new ones. But they are illuminating new structures, and we’re 00:11:53.200 --> 00:11:59.096 in the very early stages of figuring out what more we might learn from them. 00:11:59.120 --> 00:12:02.880 Now I’d like to close by previewing some things that are coming soon – 00:12:02.880 --> 00:12:06.080 within the next year or so. One is a comprehensive application 00:12:06.080 --> 00:12:10.880 of PhaseNet to NCSN data led by Ian McBrearty and in collaboration 00:12:10.880 --> 00:12:15.040 with the Berkeley Seismo Lab. Richard Allen, thank you for buying 00:12:15.040 --> 00:12:18.480 a workstation so that we could run it over there and not move 00:12:18.480 --> 00:12:21.840 all that data to Stanford. Preliminary sampling – 00:12:21.840 --> 00:12:25.840 sort of random sampling suggests hundreds of thousands of phase picks 00:12:25.840 --> 00:12:31.920 per day and about one earthquake per minute from this – from decades 00:12:31.920 --> 00:12:36.216 of continuous seismic data. So we’re going to have a lot to sift through. 00:12:36.240 --> 00:12:40.320 Every time I talk about this to regional seismic network operators, 00:12:40.320 --> 00:12:44.400 they ask about Pn and Sn. We have not worked on that. 00:12:44.400 --> 00:12:48.040 You know, because of the distances we use, we don’t have those in the catalog, 00:12:48.040 --> 00:12:52.960 so we’re working on a augmented label data set that has these difficult-to-pick 00:12:52.960 --> 00:12:59.736 phases that we will add to the STEAD database of local event phases. 00:12:59.760 --> 00:13:02.640 The one thing we haven’t done with deep learning is association. 00:13:02.640 --> 00:13:07.520 We have two approaches to that – one by Ian McBrearty using graph 00:13:07.520 --> 00:13:12.960 convolutional neural networks. The other by Weiqiang Zhu using 00:13:12.960 --> 00:13:17.600 a Gaussian mixture model approach. Both these approaches are designed 00:13:17.600 --> 00:13:22.400 to work for heterogeneous catalogs that are large, that vary with time, 00:13:22.400 --> 00:13:25.656 on which small events are detected by only a few stations. 00:13:25.680 --> 00:13:28.880 Both methods use arrival time and amplitude information – 00:13:28.880 --> 00:13:34.640 at least they can. Also coming soon, led by Weiqiang Zhu, is an end-to-end 00:13:34.640 --> 00:13:41.280 method that combines these different modules into a single linked neural 00:13:41.280 --> 00:13:44.400 network. The advantage of this is that information is not lost through 00:13:44.400 --> 00:13:48.480 thresholding and, as suggested by the inset in the lower right, 00:13:48.480 --> 00:13:54.624 we’ve applied it to Ridgecrest, and it works – it seems to work quite well. 00:13:55.520 --> 00:14:01.040 Finally, working with former postdoc Lise Retailleau and engineers at IPGP, 00:14:01.040 --> 00:14:06.400 we’ve applied PhaseNet to seismicity on the volcano – this is actually 00:14:06.400 --> 00:14:08.960 a volcano near Mayotte. It detects more events, 00:14:08.960 --> 00:14:13.280 despite not being retrained, than other methods, including SeisComP3. 00:14:13.280 --> 00:14:17.280 It improves locations. It’s now working in real time. 00:14:17.280 --> 00:14:22.320 It will be installed at the Martinique Volcano and Seismic Observatory 00:14:22.320 --> 00:14:26.480 later this year. And we, or they, are working on a modification 00:14:26.480 --> 00:14:30.480 of binder to exploit PhaseNet’s ability to pick P and S separately, 00:14:30.480 --> 00:14:32.640 which will lead to further improvements in the catalog. 00:14:32.640 --> 00:14:37.600 And you can see your PhaseNet versus standard catalog. 00:14:37.600 --> 00:14:42.080 Again, we have much, much more information to work with. 00:14:42.080 --> 00:14:47.080 So I’ll stop there and take any questions when the time comes. Thank you. 00:14:49.365 --> 00:14:56.685 [silence]