AI and Data: Enhancing Development with GitHub Copilot

ODSC - Open Data Science
5 min readOct 12, 2024

--

Editor’s note: Mabel is a speaker for ODSC West this October 29th-31st. Be sure to check out her talk, “Gen AI in Software Development. What should you be looking for?,” there!

Artificial Intelligence (AI) is not a new concept in the data world. For years, the industry has been developing AI and Machine Learning (ML) models to predict and better understand the data that surrounds us. However, we are now witnessing a new phase — AI that can assist in creating AI or, more relevant to this article, AI that can help us write any piece of code. This includes tasks ranging from creating a cleaning pipeline in Jupyter Notebooks, SQL stored procedures, or Python functions.

One such AI tool is GitHub Copilot. GitHub Copilot can be used in environments like Visual Studio Code, JetBrains IDEs, or Azure Data Studio to significantly reduce coding time. For instance, it can help create synthetic data at the start of a project, offer quick recommendations for a Jupyter notebook draft specific to your use case, and reduce the need to switch contexts by providing the code syntax you need right away. This is why the industry is increasingly adopting these AI tools. Regardless of which code assistant tool you choose, this article aims to provide a sneak peek into the main elements to consider when selecting your companion, which will be explored in more detail in my upcoming talk at the ODSC West 2024 Conference.

To fully maximize the value of AI assistants, we need to consider these three main elements:

  1. Context Awareness & Customizability

Prompt Engineering is key to getting appropriate responses from the model. However, customization is equally important. Typically, these models are predefined and due to privacy restrictions, learning based on our patterns and data is limited. Therefore, it’s crucial that the tool includes functionalities for defining the behavior of the assistant through concepts like system prompting or by leveraging an instructions file.

Another crucial aspect is Retrieval Augmented Generation (RAG). According to the paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” RAG involves providing additional knowledge to a pre-trained model to make responses more accurate without additional training. Finally, a third layer is fine-tuning or custom models. This capability allows the organization to tune the model with their own codebase, enabling the assistant to provide more customized responses. These features are a game changer for code assistant tools. Understanding where this additional knowledge comes from, how to provide it, and addressing any privacy concerns is vital for making informed decisions while getting the most from the available technology.

  1. Integration

If the tool isn’t present where you are, you’ll add unnecessary overhead, counteracting the benefits. Understanding the list of supported ecosystems and the tool’s capability for custom integrations is key. The closer the solution is to your development environment, the less context switching and the more productive you’ll be.

Custom integration capabilities are a new trend among AI Code Assistant tools. While SaaS code assistant tools are useful, expanding them for better applicability to specific use cases can be challenging, as it can be creating one completely from scratch. Strive to look for a balance between out-of-the-box capabilities and the flexibility to make necessary adjustments when needed.

  1. Additional Capabilities

Studies conducted by organizations like Software.com and Tidelift indicate that, on average, 75% of a software development project’s time isn’t spent on coding but on other lifecycle stages. While AI’s contributions at the development environment level are significant, its potential extends far beyond.

– Collaboration: AI that can assist in answering questions about your organization’s documented knowledge base.

– Troubleshooting: AI that can help troubleshoot and improve pipelines needed for retraining models.

– Security: AI that can suggest fixes for code vulnerabilities and recommend enhancements for code quality.

These are just some examples of activities that consume our time beyond coding tasks. I’m sure that along with these, there are other examples that may come to mind as you read this, which highlights the importance of finding a solution that combines coding assistance with additional capabilities. Even if you’re not ready to engage in such advanced activities currently, it’s beneficial to have them on your radar and choose a companion that offers support in these areas.

Privacy, though not always explicitly mentioned, should be a fundamental consideration in all our AI-related evaluations. It’s important to make sure the services you use transparently explain how they handle your data and respect privacy requirements. Data is a valuable asset, and responsible use and management are vital to determining whether these types of projects are feasible for an organization to adopt. Companies like Microsoft and GitHub uphold high privacy standards and are members of various industry boards. I highly recommend that all users review the terms of service for any AI solution they consider using and evaluate whether they meet your organization’s privacy and security standards, even before initiating a trial.

Lastly, once you have chosen a tool, concentrate on improving your skills as a prompter. Prompt Engineering in Gen AI models is often the pitfall in getting the most from these tools. Despite being entirely within our control, we often overlook key elements that determine the quality of the model’s response. Remember: single, short, and specific prompts are essential. Tune in to my talk to learn more.

As a Solutions Engineer at GitHub, I’m proud to witness how GitHub Copilot meets these requirements, which is why over 1.8 million customers, and 70,000 organizations trust us for their initiatives. This makes GitHub Copilot the most widely used AI code assistant globally. If you’re also interested in learning more about the companies using it and their use cases, join me at the ODSC West 2024 conference session titled: “Gen AI in Software Development. What should you be looking for?

About the Author/ODSC West 2024 Speaker:

Mabel Geronimo is currently a Solutions Engineer at GitHub, with over 8 years of experience in the technology sector. Presently, she supports leading organizations in Latin America with their modernization plans, focusing on development processes through DevSecOps practices. Originally from the Dominican Republic, she graduated from the Instituto Tecnológico de Santo Domingo (INTEC) with a degree in Systems Engineering. She currently resides in Austin, Texas, and describes herself as a technology enthusiast and a firm believer in the power of young minds.

--

--

ODSC - Open Data Science
ODSC - Open Data Science

Written by ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.

No responses yet