Skip to content
Search
Generic filters
Exact matches only

A Succesful Data Science Model Needs GitHub. Here’s Why.

  1. GitHub
  2. Benefits
  3. Summary
  4. References

This platform serves as a tool for teams and cross-functional members of an organization to be on the same version of a codebase using Git, as well as approve and comment on new code changes that have been requested and documented through a pull request.

Below, I will share and describe the benefits of GitHub on a data science project.

Photo by Jantine Doornbos on Unsplash [2].

GitHub [3] documents and guides software developers, designers, and project managers through the use of Git, pull requests, issues, wikis, and gists. Setting up a data science project is fairly simple and allows your team to conduct checks and balances of your files and code. Git is the main system for interacting in your terminal to navigate branches, code changes, and ultimately, version control. Gists are also useful for submitting code snippets for sharing — for example, if you do not want to share an entire data science project. Below, I will discuss the benefits of GitHub.

Version Control (Git) — you can perform certain commands that will push up new versions of your codebase. With Git commands like the following, a pull request can be created and your data science model code will then be monitored and enhanced. Here are some common, useful Git commands:

  • check which branch you are on — git branch
  • create a new branch off of your master branch — git branch branch_name
  • pull your master branch so it is up-to-date — git pull
  • check which branch you are on — git status
  • add your code changes from your branch — git add
  • commit your changes from your branch — git commit -m “Added change”
  • push your changes from your branch — git push

Pull Requests — this action is an extremely useful part of the GitHub platform. With pull requests, oftentimes named “PR’s”, you can have a second, third, or even more set of eyes on your code changes. When you want to add code to an existing master branch, you can create your own branch that will include that new code. People on your team will have to view and test it to make sure that your new additions will be correct. The PR process is not only beneficial for eliminating mistakes and ensuring people will double-check your work, but it is also useful in the sense that all people on your team will be on the same page. When others have to view your changes and approve the new code, they will reiterate the knowledge of the model as it expands to more files and systems.

Collaboration — with the use of GitHub, also comes the associated collaboration from multiple team members that can include other data scientists, software engineers, data engineers, and product managers. Collaboration serves as a benefit in that it will make your data science model more robust, efficient, and possibly more accurate from the influence and impact of others. You can include all appropriate people on the data science model and have a positive impact on the entire project.

Gists — these are useful if you want to share a smaller code snippet to others or even here, right on Medium, where you can display code in its appropriate programming language. It can be an easy way to display an example of your code. When you designate the programming language, say Python, in a .py format, you can easily see the color-coded functions — for example, the import code is highlighted in red. Below, is a gist to serve as an example:

Example of a gist. Code by Author [4].

While the focus in data science in academia is not necessarily on GitHub, but rather theory, concepts, and codes of common machine learning models, there should be a focus on highlighting this platform more before students enter the workforce and have to immediately start working with others. To sum, it is beneficial in developing a successful data science model.

To find out more about the Git part of GitHub, find this article below [5]:

I hope you found this article interesting and useful. Thank you for reading!

[2] Photo by Jantine Doornbos on Unsplash, (2017)

[3] GitHub, Inc., GitHub main page, (2020)

[4] M.Przybyla, pandas-append.py, (2020)

[5] M.Przybyla, Common Git Commands Every Data Scientist Needs To Know, (2020)