Excuse me, what just happened? Resilience is tough when your failure is due to a ‘sequence of events that was almost impossible to foresee’

 

Thinking about how to design and create resilient systems is tough, especially when the failure path is not apparent or is buried deep within the system design.  So called “Black Swan” events are those once in a life time catastrophic events whose failure paths often only become apparent after the event but expose flaws which in hindsight were all to apparent.  Read this great article from the “The Register” website on some basic steps to help design and test your critical communication and data systems to help avoid that “Black Swan” event.  https://www.theregister.com/2021/06/12/how_to_make_tech_systems_resilient/

 

For years, people had warned that New Orleans was vulnerable – but when a hurricane came close to destroying the city, the reaction was muted. Some people took the near miss as a warning – others, as confirmation that there was nothing to worry about.

So why do we struggle to prepare for disasters? And why don’t we draw the obvious lessons from clear warnings?

In the podcast “That Turn to Pascagoula” written and presented by Tim Harford with Andrew Wright, the authors talk about how we do not see and do not act to prevent foreseeable disasters.  A thought provoking podcast for anyone interested in how we should be working to mitigate the impacts of disasters and failures before they arrive.

Additional information and readings on the subject can be found on Tim’s website.  https://timharford.com/2020/07/cautionary-tales-that-turn-to-pascagoula/

Additional suggested readings on the subject. 

Read about Hurricane Ivan in The Ostrich Paradox by Howard Kunreuther and Robert Meyer, and other forebodings of disaster are recounted in Predictable Surprises by Max Bazerman and Michael Watkins. 

 

Verizon announced another delay for its planned January 2021 3G cellular network shutdown, but time is probably running out for 3G. All the major cellular carriers are planning to shut down their 3G networks in the near future. With these impending shutdowns, your organization should have a plan in place to validate if wireless devices are still using 3G networks and be actively implementing a migration plan for any soon to be obsolete solutions.

Bill Menezes, director analyst at tech research firm Gartner said “Verizon’s decision to delay its 3G shutdown probably has less to do with mobile phone customers still using the network — which are likely few in number — and more to do with “internet of things” devices, such as smart utility meters and home burglar alarms that are still connected to 3G”. For the corporate market, SCADA, IoT, transportation tracking, and networking devices are the prime candidates for systems still using 3G technology.  Network backup solutions, remote monitoring, and telematics solutions should all be reviewed to see if they are using 3G cellular connections. For network and communication managers special consideration may be needed for edge routers using 3G for WAN backup or out-of-band remote access as the obsolete 3G connection may not be apparent until a primary connection outage. Think about those CradlePoint routers you deployed to a branch office a few years ago with a 3G SIM card.

During a January 5th interview with the website “LightReading” Verizon spokesperson Kevin King was quoted as saying “our 3G network is operational, and we don’t have a plan to shut it down at this time.” This signals another delay in Verizon’s announced 3G shutdown plans, which had previously been announced for the end of 2019 and again for 2020. Verizon and other carriers stopped activating 3G devices in 2018 but many 3G devices are still operational in the field. In 2021, 3G mobile connections are expected to shrink to 5.7% of total cellular connections, but that still is more than 25 million devices in the US. This latest delay is only a reprieve for anyone still using 3G devices as the spokesperson clarified that Verizon is still planning to shutter the 3G network but has not announced the new date.

 It is expected that the carriers will sunset their 3G networks rather than implementing complete shutdowns on a given date. A sunset strategy may see the carrier’s 3G network equipment decommissioned as it fails or cell towers are upgraded. The sunset strategy may help extend the time to move services to 4G networks but also means that many organizations may not notice their 3G devices are slowly failing and failure will be dependent on local 3G network availability. A failure of IoT, SCADA, and other devices using 3G networking could cause some nasty surprises for organizations and lead to emergency or unplanned equipment replacement and upgrades. Now is the time to prepare so you can identify what needs to be upgraded, get your budgets approved, and start the process of making the required changes and equipment updates.

AT&T’s website provides a February 2022 date for their planned 3G network shutdown https://www.att.com/support/article/wireless/KM1324171, while T-Mobile hints that they plan to sunset their 3G network but have provided no official announcement. T-Mobile’s website currently provides no information and no planned shutdown date.

 

Sources

Verizon indefinitely delays 3G network shutdown”

By Mike Dano, LightReading, Jan 5, 2021

 

When Will 3G Be Retired? Here are the Timelines for Verizon, T-Mobile & AT&T”

By Jess Barnes, Cord Cutters News, Aug 21, 2020

 

Verizon delaying shutdown of its 3G wireless network”

By Clare Duffy, CNN Business, January 7, 2021

 

3G Networks are Becoming Extinct

Conklin, A., Yahoo Finance,  March 26, 2020

 

Verizon Support Website

https://www.verizon.com/support/knowledge-base-218813/

AT&T Support Website

https://www.att.com/support/article/wireless/KM1324171

SpaceX donated seven Starlink satellite terminals to Washington State’s Emergency Management Department (EMD) to provide emergency data services as part of the State’s 2020 wildfire response.  Some of the terminals were deployed to provide internet connectivity for the town of Malden, where it is estimated that 80% of the homes were destroyed in this small Washington town.  The Starlink system was quickly set up to provide emergency communications and broadband services for local residents. 

Malden_Starlink_Data_Svc_2020_Wildfire
Image credits WA EMD

Steven Friederich, with EMD explained, “the fire come through town and it burned a good chunk of the area, including the fire station and the post office. There simply hasn’t been a way to get a fast and reliable Internet connection there for the public to use…This is a device we could definitely utilize should we have more wildfires or even larger disasters, such as a Cascadia Subduction earthquake event, where communication problems would be a huge hurdle”.

The terminals are also being used for incident command at the Bonney Lake wildfire.  Richard Hall with Washington State’s Military Department IT was quoted as saying “I have never set up any tactical satellite equipment that has been as quick to set up, and anywhere near as reliable.”

The Starlink terminals are part of a new satellite network service currently being deployed by SpaceX.  The new satellite-based broadband service is promising low latency and high bandwidth.  While the system is still in development SpaceX has published testing results showing a 30-millisecond latency and bandwidth of up to 60 Mbps.  The company is preparing to launch a public beta for Starlink later this year for residents in the northern US and Canada.

As already demonstrated the Starlink system may provide Emergency Management and Healthcare IT/Communications access to new satellite-based data services for use in emergency situations and as a backup for traditional land-based systems. New satellite-based data networks will be an interesting service to keep an eye on in the next few years to see if these new systems can provide critical data network connectivity even if land-based systems fail or are damaged in emergency situations.

 

Sources

SpaceX Is Providing Satellite Internet Service to Towns Hit by Wildfires

By Michael Kan   September 29, 2020

https://www.pcmag.com/news/spacex-is-providing-satellite-internet-service-to-towns-hit-by-wildfires

 

 

SpaceX’s Satellite Internet Plans for Mid

-2020 Launch in the USBy Michael Kan  October 23, 2019

https://www.pcmag.com/news/spacexs-satellite-internet-plans-for-mid-2020-launch-in-the-us

 

Starlink puts towns devastated by wildfires online for disaster relief workers

Devin Coldewey@techcrunch September 29, 2020

Starlink puts towns devastated by wildfires online for disaster relief workers

The Internet experienced another large-scale outage on July 17th, 2020 that prevented access to many major websites and Internet-based services for about 25 minutes. The outage was the result of a problem with Internet DNS services provided by Cloudflare, impacting Internet services in many parts of the US, including the Seattle area.

Cloudflare, a company that is not a household name, provides world-wide Internet services such as DNS services, Content Delivery Networks, and Denial of Service protection for over 500,000 companies, many in the US and in the Hospital and Health Care industry. Cloudflare stressed that this was not a hacker attack, but a simple configuration error.

The outage follows similar outages over the last two years involving Cloudflare, Verizon, and T-Mobile highlighting the highly interdependent nature of today’s Internet.

Cloudflare’s company blog page provided a brief technical overview of the outage.

“… while working on an unrelated issue with a segment of the backbone from Newark to Chicago, our network engineering team updated the configuration on a router… This configuration contained an error that caused all traffic across our backbone to be sent to Atlanta. This quickly overwhelmed the Atlanta router and caused Cloudflare network locations connected to the backbone to fail.” This simple error by a single engineering team impacted 50% of the traffic on their network.

These outages provide an insight into issues that everyone in the Health Care IT and Communications communities should consider when planning the resiliency of their critical or essential services. Today’s Internet provides critical links in our IT and communications infrastructures which we may not even consider until the service is unavailable. A simple example; internal pager services often use an email gateway interface to your paging vendor often sends this traffic over the Internet. As these nationwide outages demonstrate it is critical that we are aware of the critical links in the chain that makes up these services and plan alternate means of providing the service during a failure event. See the PUSHECS “Ideas and Concepts” page for more information on PACE planning and Resiliency Planning.

Check out this great article from EMA1.COM by Gary Sparger on how Emergency Management must consider how they engage with social media during an emergency.  https://www.ems1.com/community-awareness/articles/emergency-communication-in-the-age-of-social-media-TRh2sJPFlNOknjSY/

“In 2017 the Pew Research Center reported that nearly 70% of American adults used social media, an increase from 5% in 2005. And because most people are using social media, agencies responsible for emergency response need to use it, too… The agency that fails to engage on social media runs the risk of being caught flat-footed when a disaster occurs.”

“Many departments use Facebook already, typically for relaying community event information. In terms of disasters, though, Facebook may be underutilized by first responders. Facebook monitors the posts on its site, and if people in a concentrated geographical area begin talking about a disaster, Facebook will open a page related to the disaster and prompt nearby users to report that they are safe. Departments can also monitor these pages and mine them for intel about conditions on the ground.”

Click here for the PUSHECS page on Developing Social Media Plan for Emergency Communications

Click here for the PUSHECS page on Facebook’s Crisis Response tools

 

Facebook’s Local Alerts tool is designed to help local governments and first responders keep people in their communities safe and in-the-know.

Since early 2018, Facebook has partnered with local authorities across the country to help them communicate urgent, need-to-know information when it directly affects people in their communities or requires them to take action. When local authorities mark posts as local alerts, Facebook amplifies their reach so that people living in an affected community are much more likely to see the alert. Facebook sends notifications to people living in the affected area, and also shows that information on Today In, a new place on Facebook for local news, community information, and conversations between neighbors.

Learn more about how your organization can take advantage of this program; https://www.facebook.com/gpa/blog/expanding-local-alerts

The raging bush fires in Australia’s New South Wales region are affecting local power and telecommunications systems in large parts of the impacted area.  Local government and Police Services issued alerts to local residents that “all telecommunication services, including mobile phones and internet, will cease between Nowra and Mourva on the night of Tuesday 12/31/2019 due to ongoing bush fires.  This will affect hospitals in the area.”  The Minister of Communications stated “Many of the outages are due to power supplies being cut off, and in some cases are the direct impact of fire on network infrastructure”  The outages were expected to last for a number of days.

Australia’s current crisis highlights the need for healthcare systems to plan in advance for how they would continue operations and communications in the event of significant infrastructure failures that may impact entire regions in a disaster.  In this case, the loss of landline, cellular, and internet services highlights the interconnectedness of today’s network infrastructure and stresses the need to think about emergency communications at multiple levels.  To help local healthcare organizations prepare for such incidents check out the PUSHECS website’s “Ideas and Concepts” Page for information on Resiliency Planning and the use of PACE Plans with links to information and resources.

See related articles:

http://www.arrl.org/news/australian-bushfires-causing-major-telecommunication-outages-hams-asked-to-remain-alert

https://www.garda.com/crisis24/news-alerts/300801/australia-telecommunications-disruptions-expected-in-new-south-wales-due-to-bushfires-december-31-update-23

https://www.miragenews.com/maintaining-resilience-and-repairing-telecommunications-in-bushfire-affected-areas/

FirstNet announced new two new airborne mobile cell site options in December 2019,  providing significant enhancements to the FirstNet emergency and disaster response capability.  The first system nicknamed the Flying COW, is a drone-based system, the second system mounts the cellular equipment on a blimp.  Both systems are managed by the AT&T Network Disaster Response (NDR) team.

The Flying COW drone system provides a rapid, highly mobile response capability.  This system is designed for smaller incidents or for deployment in remote or mountainous areas that are often difficult to cover with existing ground-based systems.  The drone with its small cell and associated antennas are connected to a ground vehicle with a tether. The tether is a fiber optic cable and power cable which allows the drone unlimited flight time and a highly secure data link.  The ground station then uses the satellite to transport the communications.

When airborne the Flying COW can provide LTE coverage for up to 40 square miles.  The system can be combined with other Flying COWs to provide an even larger footprint.

The Blimp based system is designed for larger incidents and longer-term deployment.  Like the drone, the Blimp is attached to a ground station via a tether providing the communications and power links.  The system can fly to 1,000 feet providing two times the coverage of cell sites on wheels or the drone solution.  The blimp can stay aloof for up to 2 weeks before it has to be topped up with helium.

FirstNet has over 75 portable assets available 24/7 at no additional charge to FirstNet subscribers for potential use during major incidents.  

Learn more about the FirstNet Flying COWs

https://about.att.com/innovationblog/cows_fly

Learn more about the FirstNet Blimp

https://about.att.com/story/2019/fn_hits_one_million.html

Learn more about the FirstNet NDR Team

https://about.att.com/pages/disaster_relief/network_recovery

 

Check out this great article exploring key lessons from past disaster responses and the need to not only learn but institutionalize these lessons so we don’t continue to make the same mistakes during our next emergency or incident.    “Lessons We Don’t Learn: A Study of the Lessons of Disasters, Why We Repeat Them, and How We Can Learn Them”.   Authors Amy Donahue and Robert Tuohy published this work in ”Homeland Security Affairs’  the journal of the NPS Center for Homeland Defense and Security, Article 4 (July 2006). https://www.hsaj.org/articles/167

ABSTRACT: Emergency responders intervene before and during disasters to save lives and property. The uncertainty and infrequency of disasters make it hard for responders to validate that their response strategies will be effective, however. As a result, emergency response organizations use processes for identifying and disseminating lessons in hopes that they and others will be able to learn from past experience and improve future responses. But the term “lessons learned” may be a misnomer. Anecdotal evidence suggests mistakes are repeated incident after incident. It appears that while identifying lessons is relatively straightforward, true learning is much harder – lessons tend to be isolated and perishable, rather than generalized and institutionalized. That we see problems persist is a serious concern; as emergency response missions expand to include broader homeland security responsibilities, the ability to capitalize on experience is ever more important. This article reports the results of a qualitative study of both the lessons themselves and the efficacy of the processes by which responders hope to learn them.s.