Job Description

Site Reliability Engineering / Linux Sys Administrator at RIT Solutions, Inc. summary:

The role of Site Reliability Engineer / Linux Systems Administrator involves designing, building, and maintaining scalable, automated systems and cloud environments to support Imagineering's digital experiences. The position requires expertise in Linux and Windows administration, programming, cloud services, CI/CD pipelines, infrastructure as code, and application monitoring to ensure operational excellence and system reliability. Collaboration with cross-functional teams and troubleshooting complex system issues are key duties, emphasizing automation, performance optimization, and security.

Title: Site Reliability Engineering / Linux Sys Administrator
Location: onsite in Glendale, CA 3 days per week
Client prefers candidates have both Linux and Windows
Core must haves:
1) Linux system admin and Windows but willing to consider only Linux backgrounds.
2) At least 1 programming language
3) Cloud skills and Public Cloud hosting background
Must haves:
The Senior Systems Engineer is expected to have expert level systems administration skills on both the Linux and Windows platforms, and must have experience with CI/CD platforms (GitHub Actions, GitLab CI)), systems automation (Chef/Ansible/Terraform), systems development (Go, Python, Ruby) and cloud automation tools (Boto, CloudFormation, Terraform), source control, cloud hosting, container computing, web technologies and the DevOps team culture. This position will also bring expertise on systems, operational excellence and application stability, security, performance, and capacity management, as well as documentation. This position works closely with Imagineering Technology Studio teams to brainstorm, architect, gather requirements, troubleshoot, and provide stellar customer support. The role requires someone who is creative, proactive, constructive, and highly motivated.
EXTERNAL JOB DESCRIPTION:
The Systems Reliability Engineering (SRE) team helps Imagineers create and deliver the software solutions that power experiences in our theme parks and resorts. Systems Reliability Engineers use a software engineering approach to architect, design, automate, monitor and build applications at scale. This includes operating and engineering software with close business segment alignment to deliver platforms through efficient, effective and resilient architectures. SREs are talented engineers that are focused on improving quality through a data driven approach: instrumentation, automation, and functional/unit testing.
his position is for an experienced systems engineer eager to play an integral role on the Systems Reliability Engineering team for Company supporting Imagineering to help create, build and deliver amazing digital experiences to our guests. Primary responsibilities include
designing, building, and supporting automated build and deployment systems, platforms and cloud environments that will be used to assemble
and deliver experiences to our Park and online guests.
Job Responsibilities and Duties:
Summarize job responsibilities and major duties.
-Focus on major areas of work, typically 20% or more of role
-An ideal list would have 3-5 major responsibilities/duties
-Estimate and include percentage of time spent in each, and whether performed (D) Daily, (W) Weekly, (M) Monthly or (A) Annually
Design: Leading project/planning efforts, architectural design, engineering, attending meetings w/ various teams.
Build: Implementing, integrating and configuring solutions, tools, infrastructure and systems.
Basic Qualifications
Understand how to install and configure operating systems, specifically with expertise in Linux and Windows Server.
Software Development Continuous Integration (CI) Pipeline knowledge (GitLab CI, Github Actions).
Experience in public cloud hosting services (AWS, Google Cloud, Azure) as well as familiarity with container computing (eg. Docker, ECS, Kubernetes).
Proficiency in Infrastructure as Code (Terraform, CloudFormation, Bicep, Pilumi).
Experience with Source Control Management systems (Git).
Recognized as a subject matter expert on at least one OS and proficient in
multiple operating systems, including OS performance monitoring, setup, configuration, tuning, and troubleshooting.
Proficient in web or web server technologies: Java, Node.js, Tomcat, IIS, Apache/nginx, MySQL, PostgreSQL, etc., including being able to perform basic setup, configuration, and troubleshooting.
Understand internet technologies and network protocols, including basic load balancing configurations, security zones, VIPs, SNMP, REST and DNS.
Able to implement existing base standards for new systems and/or applications
with mentoring for all of the following:
o Site monitoring and instrumentation
o Application monitoring and instrumentation
o System monitoring and instrumentation
o Resiliency and performance
Able to diagnose simple to complex system problems.
Able to author tools and scripts to be used by others to automate repeatable
production tasks in standard languages like Bash, Ruby, Python, or Go.
Advanced skills in at least one programming language such as Python, PHP, Ruby, Java, Go, Swift or C++ and able to build unit test suites for all software being developed.
Experience supporting and/or developing backend tools or services
Able to perform and provide in depth analysis on load test runs against a moderately complex system.
Demonstrates exceptional troubleshooting methodology, including the ability to author and instruct new methodologies to the SRE team.
Independently resolve moderately to highly complex system and application incidents.
Able to identify and propose system and application fixes for performance bottlenecks.
Able to evaluate new application requirements for capacity and run-time best practices.
Able to evaluate new system and/or infrastructure solutions for technical
feasibility against known requirements and standards.
Effective at dealing with change: Able to transition in role or handle a
significant modification to workflow or technology with minimal ramp-up time and with very little guidance.
Communication and Leadership Requirements
Excellent verbal and written communication to all levels in the organization.
Serves as the primary point of contact with Manager.
Demonstrates curiosity and continuous learning and self-
improvement.
Preferred Qualifications Masters of Science degree in computer science or related field or equivalent experience in technical
operations and software engineering
Required Education BS in Computer Science or related field with 7+ years o
Additional Information Bachelor of Science degree in computer science or related field or equivalent experience in technical operations and software engineering.

Keywords:

Site Reliability Engineering, Linux System Administration, Windows Server, Cloud Computing, CI/CD, Infrastructure as Code, Automation, DevOps, Containerization, Performance Monitoring

Job Tags

3 days per week,

Similar Jobs

TSR Consulting Services, Inc.

Salesforce Consultant Job at TSR Consulting Services, Inc.

Senior Technical Recruiter at TSR Consulting Services SALESFORCE ENGINEERJob Description: 3 months right to hire - Newark NJ (Hybrid)Pay: $67 on W-2 (only W-2, no H-1)Position Summary:We are seeking a highly skilled Salesforce Engineer to design, build, and optimize...

Atrium Health

Registered Nurse - Atrium Health - Med Surg Job at Atrium Health

...processes and apply evidence-based practices. Utilizes the nursing process to assess, plan, implement, and evaluate care, engaging... ...nursing services Note: Licensed nurses practicing via telehealth/telenursing/virtual modalities are required to be licensed or...

Gpac

MDS Coordinator Job at Gpac

Overview The primary role of the MDS Assessment Nurse Coordinator will be to implement the state and federal guidelines related to the MDS process, complete the MDS process and ensure that all is current and implemented. You will be delegated the administrative authority...

Hirewell

Human Resources Analyst Job at Hirewell

...Lead HR reporting (federal, state, benefits, and ad hoc). Build and maintain analytics and dashboards (advanced Excel required; Power BI/Tableau experience a plus). Support HRIS reporting and vendor file feeds. Partner with Finance & Analytics teams to deliver...

Paycom Online

Data Center Administrator Job at Paycom Online

Manages the Data Center and related mechanical devices. Manages and designs hardware deployments and cabling inside the Data Center. Documents the network and data center processes, and performs related tasks.+ This position is not eligible for sponsorship. Paycom is interested...

Site Reliability Engineering / Linux Sys Administrator Job at RIT Solutions, Inc., Glendale, CA

RzU0UHhlQnVjRzlBY1hieUtqSTNaZ1Vvbmc9PQ==