קריסה ב OVH

WWW

ב"ה אני רואה שאני ב SBG3.

הם העלו עדכון שSBG3 וSBG4, הכל בסדר ב"ה, רק אין רשת, למה? לא ברור.
חבל שלא נותנים לי דרך לחלץ את הנתונים משם לפחות

Summary:
• At 00:47 on Wednesday 10 March 2021, a fire broke out in a room at one of our four
OVHcloud datacentres in Strasbourg (SBG2).
• The fire was contained by the early hours of the morning.
• There are no injuries.
• The fire mostly destroyed the SBG2 datacentre and partially damaged the SBG1
datacentre (4 of the 12 rooms destroyed). The two other OVHcloud datacentres in
Strasbourg were not affected by the fire; the SBG3 and SBG4 servers are currently
switched off but undamaged.
• The site is not classified as a Seveso site.
• The cause of the fire has yet to be established and an investigation has been launched as
mandated by the authorities.

Actions taken by OVHcloud:
• The technical and commercial teams have been working since this morning to inform our
customers and handle the unavailability of our Strasbourg site.
• The company’s founder, Octave Klaba, has been on site since this morning with the
industrial and technical teams.
To follow updates to this situation in real time:
Twitter: https://twitter.com/ovhcloud
• A customer announcement, along with an FAQ, will be available tomorrow morning.
• To help us handle customer requests, we recommend using the ticket feature on our
website or consulting our help centre.

Our three priorities are as follows:
1. Reserve infrastructures at our other datacentres for our affected customers: we
have a stock of new servers at the Roubaix and Gravelines sites, ready to be delivered to
the majority of affected customers. We will further enhance availability in these
datacentres, with the production of nearly 10,000 new servers in the coming weeks.
Affected customers will be notified about this process as soon as possible.
2. Secure the site now that we have regained access, clean it up, and reconnect the
electricity and the network for the three affected datacentres.
3. Continue to assess the impact on our customers’ servers at the affected
datacentres, in order to find the best solutions.
We are doing everything we can to ensure a continuity of service to our customers:
• We are working on a plan to relaunch the three unaffected datacentres (SBG3 and SBG4),
the partially affected datacentre (SBG1), as well as our network, as quickly as possible.
• We ask that our customers exercise caution around the emails they receive: in times of
crisis, it is common for malicious activity (phishing, spam, etc.) to increase. It is more
important than ever to stay alert.
Impact on our operation:
• We are continuing to assess the impact of this incident, particularly for the customers
whose data was located in the datacentre destroyed by the fire.
• Our Web-based VOIP services in France were not affected.
• All of our services in our other France-based datacentres and across the world (including
15 datacentres in Europe) are fully operational.

Our mission is to provide our customers with the highest quality of services to support their
online activities and we know how important this is to them. We sincerely apologise for the
issues caused by this fire. We will continue to communicate with the greatest transparency
about the cause of the fire and its consequences.

We are assessing the environmental impact by working with the relevant authorities on a
procedure to confirm that no pollution was caused.

At this stage, we can confirm that the local residents are not at any risk.

We are continuously assessing the impact of this incident and will communicate as transparently
as possible on the progress of our analyses and the solutions to be implemented.

All of our communication channels, including our incident tracking platform, can be accessed so
that you can stay informed of developments in real time.

Comment by OVH - Thursday, 11 March 2021, 11:52AM
The first cleaning and securing works of the datacenter have started this morning.
Our teams are building a document summarizing the impacted services for each building. This document aims to give you visibility on the recovery status of your services and data in the coming days.
A FAQ will be available soon to determine the location of your products and answer your questions.

http://travaux.ovh.net/?do=details&id=49484

MusiCode

שיהיה למישהו פה פיתרון ברור,
איך להמנע ממקרה כזה -

אשמח לשמוע.

nigun

@musicode
היה דיון באקסקלוסיבי
המסקנה להשתמש במערכות בסגנון קוברנטיס והמשתעף
כדי לא להיות תלוי בשרת בודד

שמואל4

@nigun אמר בקריסה ב OVH:

@musicode
היה דיון באקסקלוסיבי
המסקנה להשתמש במערכות בסגנון קוברנטיס והמשתעף
כדי לא להיות תלוי בשרת בודד

אבל לא עוזר לאיבוד נתונים.

באמת שריפה שפורצת בחוות שרתים, ועוד בחו"ל, זה לא משהוא שקורה ביום יום, אבל עדיין, דברים כאלה יכולים להיגמר בחומר שהולך ואין מה לעשות.
אני לא יודע אם כל הלקוחות של החווה שנשרפה קיבלו את החומר שלהם חזרה, ואם באמת לחברה לא היה לזה גיבוי אז ביטוח לא ממש יכול לעזור.

מה גם, מעניין אותי איך שרפה בחוות שרתים מגיעה לכזו רמה שבניין שלם עולה בלהבות... מה יכול היה לקרות? שרת התחיל להידלק? צריך להיות מערכות אוטומטיות לכיבוי דברים כאלה או לפחות מישהוא פיזי במקום שידע את זה במיידיי ויטפל בדבר כזה נקודתית.

nigun

@שמואל4 אמר בקריסה ב OVH:

@nigun אמר בקריסה ב OVH:

@musicode
היה דיון באקסקלוסיבי
המסקנה להשתמש במערכות בסגנון קוברנטיס והמשתעף
כדי לא להיות תלוי בשרת בודד

אבל לא עוזר לאיבוד נתונים.

עוזר
כשאתה שומר כל נתון במערכת במספר מקומות.
זה כמובן דורש ארכיטקטורה שונה
אבל אפשרי.

מה גם, מעניין אותי איך שרפה בחוות שרתים מגיעה לכזו רמה שבניין שלם עולה בלהבות... מה יכול היה לקרות? שרת התחיל להידלק? צריך להיות מערכות אוטומטיות לכיבוי דברים כאלה או לפחות מישהוא פיזי במקום שידע את זה במיידיי ויטפל בדבר כזה נקודתית.

"אמור" להיות מערכת כיבוי אש
אני מנחש שהם זלזלו בזה
כנראה מטעמי חיסכון (אתה לא יכול לשלם פרוטות ולקבל שירות של אינטרפייז)

MusiCode

ראיתם מה זה.
ערימה של קונטיינרים (בלי דוקר כמנהל - בדיחת גיקים), או משהו דומה.

בגוגל הדאטה סנטר נמצא מתחת לאדמה.

גם בדאטה סנטרים הישראליים (בינת בירושלים) אא"ט הם מתחת לאדמה.

כיתוב בבעיות טעינה

nigun

@musicode
חלק מהתמונות נמצאים באתרים שחסומים בנטפרי
תנסה להעלות אותם ישירות לכאן

nigun

@שמואל4
הנה כתבה שעושה קצת סדר בHA
https://www.linode.com/blog/cloud-consulting-services/high-availability-what-you-need-to-know/

WWW

התחילו לשחזר את השרתים, ו..
Today at 6:50 pm CET, our teams on site have identified smoke in an unconnected battery room at SBG1.

The fire department was immediately contacted. In regards to the situation, we decided to apply the precautionary principle: the power supply of SBG1 and SBG4 was immediately cut off. Operations on site were temporarily interrupted, all OVHcloud and all partner teams were asked to return home. Only the security, fire safety and maintenance staff remained on site.

Once on site, firefighters identified the source in the battery room, and the incident was contained within minutes. No injuries were reported among our teams or our partners. As a precautionary measure, two security people who were affected by the smoke were examined by medical professionals.

The situation is under control and we are adapting the operations schedule for the coming day.

מצחיק אם זה לא היה עצוב...

MusiCode

מייל ממנכ"ל סקלוואי:

היי,
אנו בטוחים כי האירוע האחרון והמצער שפגע באחד מעמיתי שירותי הענן שלנו בצורה הגרועה ביותר - שריפה במרכזי הנתונים שלהם, לא נעלם מתשומת ליבך. זהו אירוע נדיר עם השלכות מוחשיות החורג מהמפעיל; המשפיעים על כל התעשייה הדיגיטלית ועלול להיות נתח טוב של החברה שלנו המחוברת אליה באופן ישיר או עקיף.

הרחבנו את תמיכתנו, וממשיכים להזדהות עם המושפעים, הן מצד המפעיל והן מצד הלקוחות, שיעבדו אינספור שעות בשבועות ובחודשים הקרובים לבניית תשתית נדל"ן, פיזית ותוכנה, מסלולי רשת, מאגרי נתונים ומערכי נתונים. , תהליכי עבודה ויישומים, ובסופו של דבר, אמון.
כאשר אירוע מסוג זה מתרחש, אנו מבינים את העובדה שאי אפשר להכחיש כי החברה שלנו גדלה להיות תלויה מאוד במחשוב ואחסון נתונים, על גבי המתקנים המטפלים בדרך כלל ובחלק חלק בעומסי העבודה. ברחבי העולם, יש רשימה קצרה למדי של תריסר מפעילי ענן הפועלים ללא הפסקה 24/7 בכדי לקיים מעל 90% מהפעילות הדיגיטלית בעולם, ו- Scaleway היא מפעילה כזו.

למרות ניהול זהיר של אנשי מקצוע ומומחים בתעשייה, תאונות קטליסמיות יכולות לקרות, אם כי לעיתים נדירות. אנו יכולים רק להרגיע אותך כי כל בעלי העניין במגזר שלנו עושים כל שביכולתם על מנת למנוע כל סוג של תקרית - סביבתית, תשתיות, אספקת חשמל או אחרת - המונעת או בלתי צפויה. למרבה הצער, תקני ISO לבדם אינם מספיקים מכיוון שהם אינם עוסקים כיום במיגון אש לנכסים פיזיים במרכזי נתונים. לכן כל מפעיל אחראי על תכנון, מדיניות הגנת האש וההשקעה שהם מחליטים לבצע להגנה מפני סיכונים שנתפסים כאלה.
התשתית של Scaleway תוכננה תמיד בסטנדרטים הגבוהים ביותר, בהתאם לשיטות העבודה המומלצות, לא משנה מה, על מנת להבטיח שאנו מציעים חוסן וביטחון מיטבי. יתר על כן, תשתית מרכז הנתונים שלנו פתוחה ומאז ומתמיד, ללקוחותינו ולמבטחים שלהם למטרות ביקורת.

יש לקח לקחת מהאירוע הזה: התלות היא האויב של חוסן . אנו נסתכל פנימה אל Scaleway כדי ללמוד מהאירוע הזה ולמפות עוד יותר את התלות שלנו. תלות בשרת יחיד או במרכז נתונים בודד עצמו היא אחריות, שיש לאזן אותה בגישה מרובת AZ או רב אזורים. באופן דומה, הכלכלה הדיגיטלית העולמית נמצאת בסיכון בגלל תלות חזקה מדי בכמה ספקיות ענן דומיננטיות, וכפי שראינו, כל האינטרנט יורד מעת לעת בגלל תלות אלה. הנטל עלינו, ספקי הענן, לספק חוסן רב ככל האפשר, אך זה כרוך בעלות: יתירות. יתירות זו היא גם דבר שעליו הלקוחות העננים צריכים להסביר, על ידי תכנון תוכניות גיבוי נפרדות ויצירת תוכניות התאוששות מאסון שלהם.
החדשות הטובות הן שאפליקציות מקומיות בענן יכולות לבנות יתירות בקלות ובקלות על ידי הפצת היישומים והנתונים שלהם על פני ספקי ענן, תוך שימוש בסטנדרטים מבוססים.
כתבתי את היצירה הזו לפני שנה, " הענן מת, יחי הענן הרב ", והוא עדיין מצלצל נכון גם היום, ביתר שאת לאחר האירוע הזה. הגישה מרובת העננים מספקת את רמת הגמישות האולטימטיבית האפשרית . אני מאמין שב- Scaleway יש לנו ענווה להכיר בכך שהענן שלנו אינו מתאים לכל אחד. אנו מאמינים שאף ספק ענן לא יכול להיות. זו הסיבה שמשמעותי פעולה הדדית. אדריכלי ענן כבר מכירים את מושגי הליבה וכבר משתמשים ב- Kubernetes, Balance Balans ו- Terraform בכדי להפשט דברים, ומספקים חוסן.
לסיכום, אל תדחו לחשוב על חוסן עוד. עבוד עם אדריכלי הענן ומנהלי המערכת שלך. שוחח עם ספקי הענן שלך, כולם. הפץ את עומס העבודה והנכסים שלך. שאל את השאלות הקשות.
כיום אנו קוראים לכל בעלי העניין להיות שקופים בכל הקשור לאמצעי הביטחון שלהם. יחד אנו בונים את החברה הדיגיטלית של המחר. אם יש לך שאלות כלשהן, תוכל להגיע אלינו לכתובת transparency@scaleway.com.

יאן לכל
מנכ"ל
Scaleway, הענן הגיוני
טוויטר - Linkedin

תחומים - פורום חרדי מקצועי

קריסה ב OVH