WEBVTT
Kind: captions
Language: en-US

00:00:01.617 --> 00:00:04.000
[silence]

00:00:04.000 --> 00:00:07.520
Hello, and thank you for the
opportunity to talk to you about the

00:00:07.520 --> 00:00:11.280
work we’ve been doing on deep
learning for earthquake monitoring.

00:00:11.280 --> 00:00:15.576
My name is Greg Beroza.
I’m at Stanford University.

00:00:15.600 --> 00:00:19.760
The funding sources for this
work are shown on the left,

00:00:19.760 --> 00:00:24.640
and the participants in it
are shown on the right.

00:00:24.640 --> 00:00:29.256
Initially, all at Stanford, but they’re
starting to scatter to the forewinds.

00:00:29.280 --> 00:00:34.400
So this figure shows schematically what
is involved in earthquake monitoring.

00:00:34.400 --> 00:00:38.480
We have our input, which is ground
motion from a seismic network.

00:00:38.480 --> 00:00:41.736
We don’t know ahead of time when
the earthquakes are going to happen,

00:00:41.760 --> 00:00:47.360
so we have to work with that data to
detect the events, to pick the phases,

00:00:47.360 --> 00:00:51.840
to associate those phases with events,
then to locate and characterize the

00:00:51.840 --> 00:00:55.736
earthquake and – earthquakes.
And then, the output from that

00:00:55.760 --> 00:01:02.400
is a comprehensive catalog
of discrete seismic events –

00:01:02.400 --> 00:01:05.736
earthquakes,
quarry blasts, whatever.

00:01:05.760 --> 00:01:09.360
My group has been working
systematically to develop machine

00:01:09.360 --> 00:01:17.336
learning-based approaches
to plug into this workflow.

00:01:17.360 --> 00:01:20.320
I’ll talk a bit about PhaseNet,
which is a machine learning-based

00:01:20.320 --> 00:01:24.960
method for picking seismic phases.
I’m not going to talk about

00:01:24.960 --> 00:01:29.336
DeepDenoiser, but that’s
a method for separating signal

00:01:29.360 --> 00:01:35.016
from noise, such that
we get better catalogs.

00:01:35.040 --> 00:01:38.560
We’ve developed a deep Bayesian
neural network for single-station

00:01:38.560 --> 00:01:40.960
earthquake location.
This is relevant when we have

00:01:40.960 --> 00:01:45.840
sparse networks or when the
earthquakes are small and difficult

00:01:45.840 --> 00:01:50.000
to detect on multiple stations.
Same goes for magnitude

00:01:50.000 --> 00:01:53.495
determination.
We have a single-station

00:01:53.520 --> 00:01:58.296
magnitude determination neural
network that will allow that.

00:01:58.320 --> 00:02:04.880
And our sort of second-generation
earthquake detection network –

00:02:04.880 --> 00:02:07.520
I’ll talk a bit about that.
It’s a multi-task network in that

00:02:07.520 --> 00:02:11.200
it does both event detection
and phase picking.

00:02:11.200 --> 00:02:17.256
And, because it does those
together, it does better at both.

00:02:17.280 --> 00:02:21.840
So mention that deep
learning is data-hungry.

00:02:21.840 --> 00:02:26.880
And what I mean by that is that it
requires a lot of data – many examples

00:02:26.880 --> 00:02:31.920
for a neural network to learn to,
say, pick seismic phases in a –

00:02:31.920 --> 00:02:36.240
in a way that’s reliable
and that can work generally.

00:02:36.240 --> 00:02:40.640
Now, seismology is very fortunate in
that we have – through the hard work

00:02:40.640 --> 00:02:46.720
of analysts, we have very large data sets.
These data sets are labeled already,

00:02:46.720 --> 00:02:51.600
which is fantastic.
But the labels are not perfect.

00:02:51.600 --> 00:02:55.520
That is, there are lots of errors
and omissions in the catalog.

00:02:55.520 --> 00:02:59.120
And these have the potential to
undermine our deep learning-based

00:02:59.120 --> 00:03:02.856
approaches if we don’t eliminate
to the extent that we can.

00:03:02.880 --> 00:03:06.960
So, some issues with earthquake
catalogs are sometimes there are

00:03:06.960 --> 00:03:10.320
earthquakes that aren’t really there.
And, hence, we would be teaching

00:03:10.320 --> 00:03:13.280
a neural network the wrong thing.
This is probably not too much

00:03:13.280 --> 00:03:18.320
of a problem. Much more insidious is
the fact that we have extra earthquakes.

00:03:18.320 --> 00:03:23.360
So shown here is our waveforms –
three-component waveforms,

00:03:23.360 --> 00:03:26.536
P wave and S wave picks.
So an earthquake detection there.

00:03:26.560 --> 00:03:29.280
Here’s an earthquake –
a smaller earthquake aftershock

00:03:29.280 --> 00:03:31.720
that’s not in
[audio cuts out].

00:03:31.720 --> 00:03:38.080
And, if we were to not have a label for
that, or to label it as noise, our network

00:03:38.080 --> 00:03:40.880
would learn exactly the wrong thing.
That is, our neural network,

00:03:40.880 --> 00:03:43.120
not our seismic network.
So we absolutely have to

00:03:43.120 --> 00:03:47.920
get rid of these extra earthquakes
or label them as earthquakes.

00:03:47.920 --> 00:03:51.280
And then, as all seismologists know,
that there are errors in picks.

00:03:51.280 --> 00:03:53.920
And so, if we’re trying to
pick arrival times,

00:03:53.920 --> 00:03:58.000
we have to deal with those –
with those errors.

00:03:58.000 --> 00:04:03.600
So, in order to get a very high-quality
data set, we curated a data set.

00:04:03.600 --> 00:04:06.960
That is, we collected
many millions of waveforms.

00:04:06.960 --> 00:04:10.880
We did quality control on them,
and called this – we added labels.

00:04:10.880 --> 00:04:14.480
We called this the Stanford
Earthquake Data Set, or STEAD –

00:04:14.480 --> 00:04:19.040
1.2 million seismograms.
So it’s not a gigantic data set.

00:04:19.040 --> 00:04:23.200
But it comes from 500,000 earthquakes
recorded at different depths and

00:04:23.200 --> 00:04:28.536
around the world, as shown here,
in a variety of tectonic settings.

00:04:28.560 --> 00:04:33.760
All these seismographs are –
or, seismograms are from local

00:04:33.760 --> 00:04:38.720
distances only – less than
a couple hundred kilometers.

00:04:38.720 --> 00:04:43.680
And the reason for that is that our
initial motivation is to locate small local

00:04:43.680 --> 00:04:48.000
earthquakes because these are the ones
most likely to be missing from catalogs.

00:04:48.000 --> 00:04:55.600
So this STEAD waveform database
is collected from a range of

00:04:55.600 --> 00:04:57.920
different seismometers,
different kinds of sensors.

00:04:57.920 --> 00:05:01.336
We have examples
of signal and of noise.

00:05:01.360 --> 00:05:05.176
We’ve added many labels,
like coda duration.

00:05:05.200 --> 00:05:09.040
And I mentioned the quality control.
And we’ve also put in an HDF5 format

00:05:09.040 --> 00:05:14.720
for easy I/O and to recruit expertise
from data scientists who don’t like –

00:05:14.720 --> 00:05:16.880
you know, who shouldn’t be
burdened with learning

00:05:16.880 --> 00:05:20.560
our seismological exchange
formats and so forth.

00:05:20.560 --> 00:05:26.720
So first, I’ll talk briefly about PhaseNet,
which was developed using data from

00:05:26.720 --> 00:05:31.200
the northern California seismic network.
PhaseNet – the input is

00:05:31.200 --> 00:05:35.520
three-component seismic data.
The output is pics of P waves

00:05:35.520 --> 00:05:39.520
and picks of S waves.
So that probability – the green –

00:05:39.520 --> 00:05:43.840
the green here is a truncated
Gaussian centered on the

00:05:43.840 --> 00:05:49.120
P wave arrival time in the catalog.
This one is centered on the

00:05:49.120 --> 00:05:54.640
S wave arrival time. And the idea is
that we engineer it such that it picks the

00:05:54.640 --> 00:06:00.905
probability peaks on the P wave arrival
time and on the S wave arrival time.

00:06:01.680 --> 00:06:07.600
And it works. This just shows
examples from different kinds of data.

00:06:07.600 --> 00:06:12.640
Nice data, clipped data, noisy data.
There’s a lot of each kind of these

00:06:12.640 --> 00:06:18.080
situations in the – in the data set.
So it learns, just like a seismologist

00:06:18.080 --> 00:06:24.456
would, to make the pick despite
challenges that it might encounter.

00:06:24.480 --> 00:06:28.160
So it works quite well.
So at the top is a histogram

00:06:28.160 --> 00:06:33.816
of PhaseNet for P and S waves.
Below for the sort of standard

00:06:33.840 --> 00:06:37.520
non-machine learning-based approach –
characteristic function-based approach.

00:06:37.520 --> 00:06:42.800
We can see that the P wave is zero mean
with a narrow – with a small variance –

00:06:42.800 --> 00:06:48.080
not much dispersion about that mean.
The S wave is not as good as the

00:06:48.080 --> 00:06:50.720
P wave, but it’s dramatically better
for the machine learning-based

00:06:50.720 --> 00:06:57.200
catalog than for the – for the
sort of standard approach.

00:06:57.200 --> 00:07:01.680
And I should mention that our
ground truth here is not exactly truth.

00:07:01.680 --> 00:07:04.960
These are analyst-reviewed picks,
which are taken as correct,

00:07:04.960 --> 00:07:08.913
but, of course, themselves
can include errors as well.

00:07:09.600 --> 00:07:12.080
Mentioned Earthquake Transformer
is our next-generation

00:07:12.080 --> 00:07:15.576
earthquake detection and
phase-picking algorithm.

00:07:15.600 --> 00:07:19.680
This is a very deep – meaning
many-layered neural network.

00:07:19.680 --> 00:07:22.000
In addition to its depth, there
are different kinds of layers.

00:07:22.000 --> 00:07:25.760
So there are convolutional layers,
which are meant to characterize

00:07:25.760 --> 00:07:30.240
the characteristics of –
characteristic features of waveforms.

00:07:30.240 --> 00:07:33.864
Those are
shown in pink.

00:07:34.800 --> 00:07:37.760
There are – shown in yellow,
there are recurrent layers.

00:07:37.760 --> 00:07:41.840
These are meant to capture the
sequencing [audio cuts out], say, the

00:07:41.840 --> 00:07:48.800
P wave before the S wave to properly
represent what we know to be the case.

00:07:48.800 --> 00:07:54.560
And then there are these so-called
attention mechanisms that use the –

00:07:54.560 --> 00:07:58.560
that focus the attention of the neural
network on the parts that matter for

00:07:58.560 --> 00:08:02.560
detection, P wave arrival picking
and S wave arrival picking.

00:08:02.560 --> 00:08:06.960
So there’s seismogram output –
or, input and three different outputs.

00:08:06.960 --> 00:08:09.920
One is when the earthquake is,
P wave arrival time, and the

00:08:09.920 --> 00:08:13.920
S wave arrival time – the
latitude just like PhaseNet.

00:08:13.920 --> 00:08:16.560
It works better than PhaseNet.
And the reason it works better

00:08:16.560 --> 00:08:21.016
is because, like a seismologist,
it uses the waveform context

00:08:21.040 --> 00:08:23.640
to pick the P waves
and the S waves.

00:08:23.640 --> 00:08:28.056
So it doesn’t just focus on
the narrow little P wave arrival.

00:08:28.080 --> 00:08:34.320
It works better, not only than PhaseNet,
but it works better than other machine

00:08:34.320 --> 00:08:38.296
learning-based approaches
for picking P waves and S waves.

00:08:38.320 --> 00:08:42.000
In addition to PhaseNet, there are
four other approaches shown here.

00:08:42.000 --> 00:08:44.640
Same kind of histogram –
P waves and S waves.

00:08:44.640 --> 00:08:48.800
And Earthquake Transformer,
for now, is the – is the best-performing

00:08:48.800 --> 00:08:53.840
of the – of the lot. So let me show
you some results that we’ve gotten

00:08:53.840 --> 00:08:59.576
using PhaseNet – not Earthquake
Transformer, but PhaseNet –

00:08:59.600 --> 00:09:06.936
for phase picking as part of this
earthquake monitoring workflow.

00:09:06.960 --> 00:09:10.480
We applied it to the Guy-Greenbrier
Arkansas earthquake, already

00:09:10.480 --> 00:09:13.840
a well-studied earthquake
sequence thought to be induced.

00:09:13.840 --> 00:09:16.400
We developed a complete
machine learning-based catalog

00:09:16.400 --> 00:09:21.840
for this using PhaseNet.
We did not re-train it.

00:09:21.840 --> 00:09:26.400
That is, this is PhaseNet as trained
on NCSN data, yet it generalizes well

00:09:26.400 --> 00:09:29.760
to these smaller, closer, shallower
earthquakes that occur in a different

00:09:29.760 --> 00:09:33.200
geologic setting, perhaps
by a different mechanism.

00:09:33.200 --> 00:09:38.080
There are 90,000 events in the catalog.
There would be hundreds of thousands

00:09:38.080 --> 00:09:44.536
if we had more stations to work with.
Our stations are very limited.

00:09:44.560 --> 00:09:48.320
These events, as I said, were thought to
be induced – or thought to be induced

00:09:48.320 --> 00:09:53.280
through a combination of hydraulic
stimulation and deep disposal wells.

00:09:53.280 --> 00:09:57.280
Clara Yoon showed the
importance of hydraulic stimulation

00:09:57.280 --> 00:10:01.120
to the early parts
of this sequence.

00:10:01.120 --> 00:10:04.480
So that did we learn?
Well, we learned that this is actually

00:10:04.480 --> 00:10:09.520
the superposition of two sequences.
So what this just shows – latitude

00:10:09.520 --> 00:10:14.720
versus calendar time, the evolution
of seismicity in our catalog.

00:10:14.720 --> 00:10:17.200
You can see these
characteristic parabolic shapes.

00:10:17.200 --> 00:10:20.216
You can fit a diffusivity to that.
It’s quite high.

00:10:20.240 --> 00:10:24.056
Order of 10 to 100 meters
squared per second.

00:10:24.080 --> 00:10:29.440
Also notice that the sequence starts
up here, propagates to the south,

00:10:29.440 --> 00:10:33.440
and then sort of putters on,
and then it re-nucleates – or, another

00:10:33.440 --> 00:10:37.840
sequence nucleates down here and
spreads back towards where the

00:10:37.840 --> 00:10:41.920
sequence began as well as to the south.
And it’s the second part of the sequence

00:10:41.920 --> 00:10:46.376
that had the magnitude 4.6
that shut down the –

00:10:46.400 --> 00:10:49.840
shut down the wells – 1 and 5.
And yet, this – you know,

00:10:49.840 --> 00:10:56.695
there’s a chance that this Well 2
was involved in triggering this sequence

00:10:56.720 --> 00:10:59.600
because that sequence actually
nucleated farther to the south.

00:10:59.600 --> 00:11:02.400
So some interesting new
insights into this already

00:11:02.400 --> 00:11:05.336
well-studied
earthquake sequence.

00:11:05.360 --> 00:11:10.080
We’ve applied PhaseNet to the
Amatrice – well, to the Apennines

00:11:10.080 --> 00:11:16.080
sequence – the 2016-2017 part of that.
It’s a complex normal-faulting

00:11:16.080 --> 00:11:21.336
sequence. Killed hundreds of people.
Left tens of thousands of people

00:11:21.360 --> 00:11:25.680
homeless in central Italy. And here,
again, is a catalog comparison.

00:11:25.680 --> 00:11:30.160
The top four panels show the
standard catalog – a very good catalog

00:11:30.160 --> 00:11:36.080
developed by scientists at INGV.
Has 82,000 events in it during

00:11:36.080 --> 00:11:39.600
that one-year time period. That’s about
one earthquake every six minutes.

00:11:39.600 --> 00:11:45.920
The PhaseNet catalog has over 10 times
as many events – over 900,000 events.

00:11:45.920 --> 00:11:49.360
That’s one every 35 seconds.
These are all really tiny earthquakes –

00:11:49.360 --> 00:11:53.200
all the new ones. But they are
illuminating new structures, and we’re

00:11:53.200 --> 00:11:59.096
in the very early stages of figuring out
what more we might learn from them.

00:11:59.120 --> 00:12:02.880
Now I’d like to close by previewing
some things that are coming soon –

00:12:02.880 --> 00:12:06.080
within the next year or so.
One is a comprehensive application

00:12:06.080 --> 00:12:10.880
of PhaseNet to NCSN data led by
Ian McBrearty and in collaboration

00:12:10.880 --> 00:12:15.040
with the Berkeley Seismo Lab.
Richard Allen, thank you for buying

00:12:15.040 --> 00:12:18.480
a workstation so that we could
run it over there and not move

00:12:18.480 --> 00:12:21.840
all that data to Stanford.
Preliminary sampling –

00:12:21.840 --> 00:12:25.840
sort of random sampling suggests
hundreds of thousands of phase picks

00:12:25.840 --> 00:12:31.920
per day and about one earthquake
per minute from this – from decades

00:12:31.920 --> 00:12:36.216
of continuous seismic data. So we’re
going to have a lot to sift through.

00:12:36.240 --> 00:12:40.320
Every time I talk about this to
regional seismic network operators,

00:12:40.320 --> 00:12:44.400
they ask about Pn and Sn.
We have not worked on that.

00:12:44.400 --> 00:12:48.040
You know, because of the distances we
use, we don’t have those in the catalog,

00:12:48.040 --> 00:12:52.960
so we’re working on a augmented label
data set that has these difficult-to-pick

00:12:52.960 --> 00:12:59.736
phases that we will add to the STEAD
database of local event phases.

00:12:59.760 --> 00:13:02.640
The one thing we haven’t done
with deep learning is association.

00:13:02.640 --> 00:13:07.520
We have two approaches to that –
one by Ian McBrearty using graph

00:13:07.520 --> 00:13:12.960
convolutional neural networks.
The other by Weiqiang Zhu using

00:13:12.960 --> 00:13:17.600
a Gaussian mixture model approach.
Both these approaches are designed

00:13:17.600 --> 00:13:22.400
to work for heterogeneous catalogs
that are large, that vary with time,

00:13:22.400 --> 00:13:25.656
on which small events are
detected by only a few stations.

00:13:25.680 --> 00:13:28.880
Both methods use arrival time
and amplitude information –

00:13:28.880 --> 00:13:34.640
at least they can. Also coming soon,
led by Weiqiang Zhu, is an end-to-end

00:13:34.640 --> 00:13:41.280
method that combines these different
modules into a single linked neural

00:13:41.280 --> 00:13:44.400
network. The advantage of this is
that information is not lost through

00:13:44.400 --> 00:13:48.480
thresholding and, as suggested by
the inset in the lower right,

00:13:48.480 --> 00:13:54.624
we’ve applied it to Ridgecrest, and it
works – it seems to work quite well.

00:13:55.520 --> 00:14:01.040
Finally, working with former postdoc
Lise Retailleau and engineers at IPGP,

00:14:01.040 --> 00:14:06.400
we’ve applied PhaseNet to seismicity
on the volcano – this is actually

00:14:06.400 --> 00:14:08.960
a volcano near Mayotte.
It detects more events,

00:14:08.960 --> 00:14:13.280
despite not being retrained, than other
methods, including SeisComP3.

00:14:13.280 --> 00:14:17.280
It improves locations.
It’s now working in real time.

00:14:17.280 --> 00:14:22.320
It will be installed at the Martinique
Volcano and Seismic Observatory

00:14:22.320 --> 00:14:26.480
later this year. And we, or they,
are working on a modification

00:14:26.480 --> 00:14:30.480
of binder to exploit PhaseNet’s
ability to pick P and S separately,

00:14:30.480 --> 00:14:32.640
which will lead to further
improvements in the catalog.

00:14:32.640 --> 00:14:37.600
And you can see your PhaseNet
versus standard catalog.

00:14:37.600 --> 00:14:42.080
Again, we have much, much more
information to work with.

00:14:42.080 --> 00:14:47.080
So I’ll stop there and take any questions
when the time comes. Thank you.

00:14:49.365 --> 00:14:56.685
[silence]