Square Enix Cuts Troubleshooting Time From Weeks to Seconds by Gaining Data-Driven Visibility

square-enix

Executive Summary

Creator of beloved games like Final Fantasy, Dragon Quest and Tomb Raider, Square Enix spreads happiness across the globe by offering unforgettable digital entertainment experiences. Yet to better deliver the experience its fans love, the video game operator needed a log analytics solution that could offer full-stack visibility across its on-premises and cloud-based environments, which span three global regions. With proactive problem management as its first objective, Square Enix turned to the Splunk® Data-to-Everything Platform, which has:

  • Boosted productivity with automated, centralized log management
  • Enhanced efficiency by reducing troubleshooting time from weeks to seconds
  • Gained organizational visibility by turning data into actionable analytics
Every Second Counts

Keeping pace with user demands in online gaming has never been easy — especially when Square Enix had to manually monitor server health. “We run our flagship game Final Fantasy XIV by managing massive on-premises and cloud infrastructures across Japan, the U.S. and Europe,” says Junpei Kakefuda of the company’s information system department. “But could you imagine how much effort we spent on analyzing failures?”

“We had to manually review logs from heterogenous servers to identify the root cause, but the sheer volume of files made prompt analysis difficult. Worse still, we couldn’t properly monitor the content delivery network or correlate the application and infrastructure layers — meaning that a single disconnection or time lag took weeks to resolve. That just didn’t make sense in our modern world, where every second counts.”

To overcome these obstacles, Square Enix turned to Splunk. “For companies who prefer starting small, like us, Splunk is a perfect match,” Kakefuda explains. “We don’t have to pay extra for additional clusters or predefine schemas despite the great variety of log formats.”

Challenges
  • Lacked proactive log analysis for stable operations
  • Delayed troubleshooting and reduced productivity due to manual, time-consuming processes
  • Poor operational visibility without a data-driven analytics platform
Maximizing Data Across Borders

The Splunk platform automatically collects, indexes, correlates and analyzes Square Enix’s ever-increasing log data on a centralized platform, enabling real-time insights through a comprehensive view across virtual and physical environments. This fosters seamless collaboration and full data transparency across the three regions where the business operates.

With Splunk, Square Enix flexibly monitors a wide variety of logs — from system, audit and latency-related logs to session logs from in-house applications. Through an application programming interface, Splunk also takes care of content delivery network data, SaaS audit logs for ID management services and GitHub, and endpoint management logs that contain health check information from the SaaS applications. Square Enix can also customize the retention period and requirements, so that logs can be viewed in real time or collected at regular intervals as needed.

With Splunk’s easy-to-use dashboards that show event correlations and statistical trends, Square Enix better understands user preferences and forecasts demands while also facilitating load balancing and future infrastructure expansion. Kakefuda and his team now access logs without logging in to the host, conducting analysis across multiple types of logs and gaining deeper operational visibility — no manual scripts necessary.

From Weeks of Work to Seconds of Smarter Processes

Splunk has played a vital role in Square Enix’s transformation from reactive troubleshooting to proactive risk management. Instead of hastily reading logs to investigate sudden issues, the team now uses automatic log analysis to catch anomalies before they turn into problems. Cumbersome manual reporting has been replaced by simple dashboards and visualizations that make data more accessible to users.

“Issues that would have taken weeks of searching are now identified and resolved within 30 seconds,” says Kakefuda. “Our workspace becomes more efficient and productive, giving us more time to focus on higher-priority assignments.”

If a player experiences a disconnection, probably in the event of a DDoS attack, Splunk technology will use systematic correlation analysis to identify the cause in the server. This knowledge, combined with network-level troubleshooting, enables Square Enix to easily discover and eliminate hidden dangers, safeguard server stability and maintain its reputation of offering a bug-free gaming experience.

Business Impact
  • Increased data transparency, collaboration and operations for the team, which spans three regions across the globe
  • Reduced time to investigate failures from weeks to seconds, boosting efficiency and uptime
  • Turned data into actionable analytics through improved visibility and predictive capabilities
A Data-Forward Future

With Splunk, Kakefuda is looking to expand log analysis from servers to network and applications. “If we analyze incidents based on game character names and IDs in addition to IP addresses, we will be able to detect unauthorized access, frauds and real money trading.”

Splunk’s value extended beyond log management when Kakefuda discovered that the platform is also a security information and event management (SIEM) tool. “Anyone with SQL experience can pick up Splunk’s search processing language easily or with minimal training,” says Kakefuda. “We can use the Splunk solution in so many ways, and we appreciate the quick response of the Splunk team to any inquiries we have.”

Moving forward, Kakefuda has many more ideas in mind, like using Splunk SmartStore to locally store cache data and other information in remote storage. He also plans to use Splunk technology for higher observability of historical patterns, which will continue to fuel data-driven decision-making across the business.

Vital for operational excellence, Splunk breaks the visibility bottlenecks of log management, allowing us to see the complete picture of our tech stack and derive exceptional value from real-time analytics."
— Junpei Kakefuda, Information System Department, Square Enix
Industry: Online Services
Do More With Splunk