Senior Manager, Infrastructure Operations

Posted over 1 month ago

Job Description

Senior Manager, Infrastructure Operations

The Senior Manager of Infrastructure Operations will lead and manage the Infrastructure Operations and Engineers supporting Colocation and AWS Infrastructure and automation for different applications in FFN as part of the migration and Cloud First journey. Be a key contributor on Stability, Scalability, governance, decisions related to projects, and participate as part of a cloud operations decision group both collaboratively within Intel and as an externally regarded leader in the space.

This position requires a highly motivated, proactive leader with the leadership, collaboration, and communication skills necessary to forge a partnership with the senior leaders in the business units and within Technology. Additionally, this leader should bring broad knowledge of IT and Freedoms business, along with key relationships across the company.

The Senior Manager will be a member of the IT leadership team and report directly to the SVP, Technology Operations.

Primary Accountabilities

The Senior Manager, Infrastructure Operations is responsible for:

Lead the teams supporting Cloud application infrastructure for various cloud initiatives at a large technology or fintech organizations
Lead a team of 10+ talented Network, Systems, Cloud Infrastructure and DevOps engineers
Ensure team delivers with high quality and predictability
Partner with DevSecOps, Architecture, API, Delivery, Security organizations while building highly scalable, secure AWS Cloud Infrastructure as code.
Partner closely with peer Engineering & Technology leaders to ensure we operate as a single team
Proven leadership with ability to lead multiple teams in a fast-paced multi-disciplinary environment
A willingness to mentor people inside/outside of the Information Technology department on best practices, system design principles, and computing industry trends
Continuously manage, monitor, and update architecture models as business needs evolve and additional cloud services become available.
Have managed production infrastructure sites for front and back-end services
Good knowledge of Linux internals and administration
Deep knowledge of infrastructure as code principles, knowledge of Terraform is a must to have.
Deep experience with AWS (Cloud Computing: EC2, S3, RDS, VPC, Security Groups, ELB...)
Able to define actionable monitoring and alerting for systems
On-call experience dealing with production incident management and resolution
Cloud Expert: Well versed in AWS services for monitoring, logging, metrics, high availability, and automation
Operationally Focused: Passionate about monitoring, resiliency, uptime, performance and automation
Effective Communication: Excellent listener; proven collaborator with superiors, peers and staff
Automation Driver: Constantly look for automation opportunities
Curious: Hands-on, "roll up your sleeves" collaborative style of working
Passionate: Bring energy and enthusiasm to the job and organization
Achiever: Consistently attain/exceed individual and team goals
Multitasker: Ability to juggle multiple work items
Enjoy problem solving: Ability to find creative and reliable solutions to complex problems
Define Service Level Objectives and performs the work required to ensure we meet those SLOs.
Knowledge of networking and monitoring skills
Strong communication skills with an ability to relay incident details expeditiously, concisely, and accurately
Proficient leading remote online collaborative meetings adhering to project management principles and documentation
Strong organizational skills with extremely high level of attention to detail
Highly motivated, quality conscious self-starter that requires little to no supervision, able to own tasks from start to finish
Customer focused - Investigates and resolves customer issues and inquiries (i.e., emergency and non-emergency)
Identify, receive, triage and act upon events and incidents coming from various SaaS services
Consistently meets or exceeds established Command Center key performance indicators (KPIs)
Work per escalation, notification and incident practices
Monitor the availability or the CI/CD environments
Working under pressure in production environments running production customer workloads and services
Previous knowledge or strong desire to learn about crisis management issues.
Primarily focus on 24x7x365 eyes-on-glass monitoring, alerting, requests, and troubleshooting to include:
Alert verification and validation of false positives in alignment with SOPs
Performing daily system monitoring, verifying the integrity and availability of cloud infrastructure, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups, live data feeds, and batch processing
Managing internal and external access requests, including approvals and general user administration in alignment with user access control policy
Facilitating scheduled and ad-hoc requests including, but not limited to, application restarts and instance resizing
Sending internal and external communications for scheduled maintenance and high-priority major incidents
Triaging all support requests and performing preliminary investigation for all reported issues
Attempting to provide first-call resolution for all reported issues by researching documentation and knowledge base
Performing root cause analysis (RCA) and drafting customer-facing summary of events and preventative measures

Primary Contacts

SVP, TechOps (and reports to the SVP) daily
IT Leadership - regularly
Business unit and functional executives regularly
Outside vendors and technology leaders in other companies regularly

Job Requirements

Bachelor's or Master's degree in computer science, information systems, business administration or related field.
10 or more years in IT and business/industry
Five to seven years of leadership responsibility in managing multiple, large, cross-functional teams or projects and influencing senior-level management and key stakeholders
Proven experience in working with external service providers
Demonstrated effective leadership, teamwork and influencing skills
Very strong budgeting, planning, and financial management skills (prior experience in IT budgeting and forecasting)
Exceptional project management skills, including the ability to effectively deploy resources and manage multiple projects of diverse scopes in a cross-functional environment
Excellent oral and written communication skills, including the ability to explain technology solutions in business terms, establish rapport and persuade others
Excellent interpersonal and communication skills (written, verbal, presentation, negotiation), including the ability to communicate effectively with people at different job levels within the organization.
5+ years of experience managing Cloud Operations and support teams
Hands-on experience with typical project and system/customer support. This includes planning, coordinating, customer education and support, troubleshooting, problem resolution, product evaluation, and documentation. Additional needed experience includes
Implementation, management, and administration of Enterprise systems tools and processes
Granting SSH and RDP access
Network configuration of Firewalls, VPN, Routers/Switches, and Load Balancers
Troubleshooting and resolving single customer issues with Windows, Mac, and Linux, VPN, permissions, and ownership of a wide variety of account administration tasks.
Following ITIL processes (Incident, Change and Problem Management)
Experience with AWS Managed Services (EC2, DynamoDB, RDS, Lambda)
Experience with AWS Networking & Security Groups and their underlying technologies (Route53, VPC, ALB, Security Groups)
Experience in Linux environments (Ubuntu, Amazon Linux)
Experience in Infrastructure as Code (Terraform, Gitlab CI/CD)
Knowledge of one or more programming/scripting languages (Python, Go, Bash)
Knowledge of container platforms (Docker, Kubernetes, ECS)
Knowledge of configuration management and automation tools (Puppet, Chef, Ansible, SaltStack)
Knowledge of agile software development practices and release management
Good teamwork skills and attention to detail

48239418

Freedom Financial Network

Senior Manager, Infrastructure Operations

Job Description

Sorry, this job has expired.