Data Scientist

Data Scientist helps to determine and discover hidden information in vast amounts of datasets, metrics, and other data fields in order to create insight for Agency leadership to make more informed decisions. Data Scientist applies data mining techniques, conducts statistical analysis, builds high quality prediction systems integrated with Agency products and utilizes new industry standards and best in business practices.

The ideal candidate is adept at using large data sets to find opportunities for product and process optimization and using models to test the effectiveness of different courses of action. Active IRS MBI clearance is a plus. They must have strong experience using a variety of data mining/data analysis methods, using a variety of data tools, building and implementing models, using/creating algorithms and creating/running simulations.

Principal Duties and Responsibilities:

  • Primarily responsible for identifying key datasets, metrics, and combining fields of data from various sources and analyzing to develop actionable insights for Agency Senior Leaders to better understand how the business performs and ultimately building AI tools that automate certain processes within Agency applications
  • Enhancing data collection procedures to include information that is relevant for building analytic systems
  • Processing, cleansing, and verifying the integrity of data used for analysis while also conducting ad-hoc analysis and presenting results in a clear, direct, and actionable format (i.e. utilizing Program Management Institute Terminology) and creating automated anomaly detection systems and constantly tracking its performance through selecting features, building and optimizing classifiers using machine learning techniques and Data mining using state-of-the-art methods
  • Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.
  • Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies.
  • Develop company A/B testing framework and test model quality and enhancing data collection procedures to include information that is relevant for building analytic systems
  • Creating automated anomaly detection systems and constant tracking of its performance
  • Develop processes and tools to monitor and analyze model performance and data accuracy.

Required Skills:

  • Active IRS MBI clearance is a plus.
  • Experience using statistical computer languages (R, Python, SLQ, etc.) to manipulate data and draw insights from large data sets.
  • Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Experience with common data science toolkits, such as R, Weka, NumPy, MatLab, etc.
  • Experience with NoSQL databases, such as MongoDB, Cassandra, HBase
  • Good applied statistics skills, such as distributions, statistical testing, regression, etc.
  • Strong problem-solving skills with an emphasis on product development.
  • Data-oriented personality
  • Excellent written and verbal communication skills for coordinating across teams.


  • Bachelor’s Degree in Information Technology, Computer Science or related field required
  • Minimum 4 years of experience as a Data Scientist
  • Professional certifications


  • Coding knowledge and experience with several languages: C, C++, Java, JavaScript, etc.
  • Knowledge and experience in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, text mining, social network analysis, etc.
  • Experience using web services: Redshift, S3, Spark, DigitalOcean, etc.
  • Experience analyzing data from 3rd party providers: Google Analytics, Site Catalyst, Coremetrics, Adwords, Crimson Hexagon, Facebook Insights, etc.
  • Experience with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Spark, Gurobi, MySQL, etc.
  • Experience visualizing/presenting data for stakeholders using: Periscope, Business Objects, D3, ggplot, etc.