VLink Inc. is a global software engineering and IT staffing partner, delivering innovative solutions with the most highly vetted expert software development teams. We leverage the latest technologies and the best IT talent to drive business growth for Fortune-500, Large and SMB clients by delivering a customized, personal approach, to ensure their unique technology needs are met. Founded in 2006, VLink takes pride in our highly revered workforce whose productivity, tech agility, and expertise produce transformative customer success stories year-after-year.
Job Role: TechOps Engineer
Requirements:
We are seeking a highly skilled and motivated TechOps Engineer to join our dynamic team. In this role, you will be responsible for debugging and resolving issues related to applications deployed in our OpenShift Container Platform (OCP) environment. Additionally, you will leverage the ELK (Elasticsearch, Logstash, Kibana) stack for enhanced observability and monitoring of application performance. The ideal candidate will have a strong background in OCP, application troubleshooting, Python scripting, and ELK.
Experience with monitoring tools such as Prometheus, Grafana, or similar.
3+ years of experience in a TechOps or DevOps role, with a focus on application debugging in OCP.
Strong knowledge of OpenShift Container Platform (OCP) and Kubernetes.
Excellent problem-solving skills and the ability to work under pressure.
Strong communication and collaboration skills.
Excellent problem-solving skills and the ability to work under pressure.
Strong communication and collaboration skills.
Proficiency in Python scripting for automation and diagnostic tasks.
We need candidate having experience in RHCOS(Red Hat Linux Enterprise OS) he has worked on OCS
We can look for other scripting language Shell/Bash/Java.
Red Had linux enterprise Os (RHCOS)
Job Description:
Diagnose and resolve issues related to applications running on OpenShift Container Platform (OCP).
Collaborate with development and DevOps teams to identify root causes and implement corrective actions.
Use monitoring tools to track application metrics and logs for proactive issue detection.
Develop and maintain Python scripts to automate operational tasks and processes. Integrate Python scripts with existing systems and tools to enhance efficiency and productivity.
Respond to and resolve technical incidents and service requests in a timely manner.
Collaborate with cross-functional teams to identify root causes and implement corrective actions.
Document incident resolutions and create knowledge base articles for future reference.
Provide on-call support as needed to ensure system availability and uptime.
Generate regular reports on system metrics, incidents, and operational activities.