Schedule - Day 2
2019-Nov-17
Start :
8:30
Stop :
9:00
Breakfast & Registration
2019-Nov-17
Start :
9:00
Stop :
10:00
Lightning Talks
2019-Nov-17
Start :
11:00
Stop :
11:25
Talk room #1
Mining is one of the world’s oldest industries. Since the beginning of time, we have been using resources to fuel innovation and progress. However, the mining industry is beginning to undergo major changes and embrace innovation. GoldSpot Discoveries was born out of a hackathon in 2016, and in three years debuted on the TSXV by using AI for mineral exploration. In this talk join us to learn how GoldSpot is using the power of data and AI to change mineral exploration, some of the challenges we have encountered, and how we’re planning to change the industry.
Talk room #2
Some languages (C++, Python) support operator overloading, and their communities are quite happy to use this nifty feature for more readable and expressive code. And some languages (Java, Go) are resolutely opposed to this terrible idea, and would sooner die than let you change the meaning of + or /. Why the divide? I'll explore some of the uses and abuses of operator overloading, and suggest ways to use it appropriately.
PyData Track (Round Room)
For a Python developer, using PySpark can often feel foreign, like driving a race car in sandals. You see the power, yet it feels like you're fighting against the machine. This talk is about battle stories using PySpark from development to production, and how my many errors can lead to better code on your end. In no particular order, I'll discuss about speeding up your development, avoiding 'friendly enemies' and testing your code. You'll see how to avoid embarrassing mistakes by seeing me making them, and you'll leave a more insightful PySpark developer.
Tutorial (Clipper Room)
2019-Nov-17
Start :
11:40
Stop :
12:05
Talk room #1
This talk is a destructive examination of the workings of Python's inheritance model. We'll learn how it works by breaking it. By breaking inheritance in various ways, we will learn about the hooks that Python gives us to interact with it. These include using the metaclass to alter the class during construction, and two dunder methods which serve as useful hooks. Particular examples will include: Implementing interfaces in Python, turning inheritance off in favor of explicit reuse, and simplifying classes by disabling a selection of features you don't wish to use. While none of these are meant to be particularly useful implementations, they make good use cases to show the machinery.
Talk room #2
Since Python 3.5 and PEP 492, we have been able to write asynchronous programs in an easy and Pythonic way without external libraries. Even so, it is still difficult to understand what asynchronous programming is all about and when we, Python developers, should consider using it. This talk will give you a gentle introduction to the world of asynchronous programming, focusing mostly on the core concept of async programming, how it works, and what its applications are, in order to provide a good foundation to Python developers on the topic. On top of that, we will explore a small code example (mostly involving the built-in asyncio
) and briefly exam the source code of CPython to find out how it works. This talk will also give you some brief comparison of threading.Thread
and ThreadPoolExecutor
.
PyData Track (Round Room)
Why is feature engineering considered the ‘dark art’ of machine learning? Transforming raw data into a form that your machine learning algorithm can utilize seems mysterious and downright frightening! Bring your wizard hat and join me as this machine learning apprentice shares her personal book of feature engineering incantations.
Tutorial (Clipper Room)
2019-Nov-17
Start :
12:20
Stop :
12:45
Talk room #1
Do you hate losing at board games? Do you have too many friends and too few enemies? Then this talk is for you! I will demonstrate rapid prototyping of board games in Python, training simple yet powerful game AI, and extracting their strategies in a human-understandable way to ultimately defeat all your board game foes. This talk will cover helpful newly-introduced Python features for rapid prototyping, using keras for games, and Monte Carlo Tree Search. You will see the effectiveness of these techniques on popular games such as Settlers of Catan and Machi Koro.
Talk room #2
Today, we are moving towards machine learning. Making predictions, finding out insights based on data. For the same purpose, the initial step is to have efficient processes in place which help us in collecting data from various different data sources. Using traditional ways to collect data is tedious and cumbersome. Manually running scripts to extract, transform and load data is a trade-off with time. To make the process efficient. The data pipeline can be automated. Scripts to extract data can be auto-scheduled using crontab. However, using crontab has its own drawbacks. One major challenge comes in monitoring. This is where an open-source tool built by Airbnb engineering team, Apache airflow, can help. Airflow is a platform to programmatically author, schedule and monitor workflows.
PyData Track (Round Room)
Data on energy markets in Canada is abundant. Various datasets are produced by different levels of governments, private companies and industry associations, international organizations and academia. However, this data is not always standardized and is not centralized. Often, data is not comparable because units are not uniform (ex. imperial, metric) and temporal periods are variable (annual, monthly, daily). Formats range from csv, excel, html or PDF. Before any analytics can commence, data needs to be extensively cleaned up. The Canadian Energy Regulator (CER), in cooperation with three other federal agencies, recently launched the Energy Indicators. The indicators are a series of interactive dashboards that display most recently available public information related to energy commodity markets in Canada (crude oil, natural gas, electricity and Canada’s energy transition). Jupyter Notebooks to accompany each dashboard have also been created to provide a transparent data access methodology to allow reproducibility by others and clear the path for more advanced analytics related to Canada’s energy information.
Tutorial (Clipper Room)
2019-Nov-17
Start :
12:45
Stop :
13:45
Lunch
2019-Nov-17
Start :
14:45
Stop :
15:10
Talk room #1
This talk will go through surprising features, as well as some code golf to explore surprising applications of unsurprising features. Disclaimer: This probably won't help you do your job.
Talk room #2
How many times has a library just handed you a big-ol' messy bag of data? You know, when a function hands you an imprecise dict, or an overlong tuple? What are you supposed to do with that? Python has great default data structures, but until recently it hasn't come with a lot of great support for structuring data; that is, the available options rarely encourage the right thing, which is often a custom made data structure, or record type. But there are libraries and modern features to help with this problem and allow you to export neatly organized types quickly and easily. These tools make it easier to box up your data and give it to consumers in a neat, friendly, and controlled way.
PyData Track (Round Room)
With a continued shortage of rental units, finding the ideal apartment in Toronto, let alone one you can afford, is a daunting & time consuming task. But rest assured, Python plus a bit of webscraping can go a long way. This talk will highlight how you can use Python, PostGIS, Slack, and some un-supervised clustering to find a place to live.
Tutorial Room
While Python is an extremely versatile language, it isn't exactly known for its blazing performance. When developing math-intensive applications, particularly on low-power devices such as single-board computers, this can become a real issue. This talk provides an introduction to vectorization and libraries designed to support it (such as Numpy), giving you the tools you need to eliminate those pesky for loops and achieve a drastic performance boost.
2019-Nov-17
Start :
15:25
Stop :
15:50
Talk room #1
In today’s world of developing services we tend to move fast and with that comes mistakes. This talk will discuss using post-mortems to turn incidents into opportunities for improvement, instead of just an opportunity to assign blame. While developing software, bugs and mistakes are inevitable. Come to hear how we can improve the approaches we often take as software developers to work better with one another in heated moments of failure and the aftermath of incidents. Through better interactions we can build better teams and create better services. In my career I have worked in a blameless post-mortem and a blame-full post mortem environment, across a variety of projects ranging from individual python libraries, to core infrastructure for a cloud. I am excited to share how I think not assigning blame when things go wrong results in a better team and a better product.
Talk room #2
Programming is not just about writing runnable codes but writing maintainable and extensible applications. Besides the general design issue, many code quality aspects need to take care, such as PEP 8, test coverage, security, etc. Manually checking them can be time-consuming and error-prone for both writers and reviewers. Thus, I'd like to introduce tools that can help us with each of these checkings. In this talk, you will hear a brief introduction of over a dozen tools. By leveraging tools, we get rid of the repetitive and tedious tasks and focus on higher-level software design. In the end, I'll propose a workflow combining all these tools which can be quickly followed and adjusted to any software project.
PyData Track (Round Room)
Our entire world wide web already is composed of linked data. It is not surprising that cultural instutions which provide and curate open data, such as our libraries, archives and museums have begun to transform their metadata into linked data. In this talk we will explain why linked data is powerful and demonstrate the process of how to extract data insights from it using the python modules RDFlib and plotly. RDFlib is a powerful library used for working with triple data and representing information. As we will learn in this talk, linked data is queried with a query language called SPARQL which is supported by the RDFlib library. We’ll move from parsing data and then bring out your inner artist with plotly to create visualizations. The plot will thicken when we briefly touch upon how machine learning can be applied to linked data and the ways in which working with linked metadata is different and has unique promises not present in other forms of linked data. By the end of this talk you will be able to see for yourself how to draw relationships out of open linked data and the vale of communicating the relationships visible in linked data.
Tutorial Room
While Python is an extremely versatile language, it isn't exactly known for its blazing performance. When developing math-intensive applications, particularly on low-power devices such as single-board computers, this can become a real issue. This talk provides an introduction to vectorization and libraries designed to support it (such as Numpy), giving you the tools you need to eliminate those pesky for loops and achieve a drastic performance boost.
2019-Nov-17
Start :
16:05
Stop :
16:30
Talk room #1
Although known by most, pull requests are often not dealt with in the most effective way. Believe it or not, there are teams that don’t review code at all! People may assume that a senior developer is experienced enough to not make any mistakes, or that merely changing those 3 lines of code couldn’t possibly do any harm to the system. In these cases, it’s not uncommon to skip the code review in order to cut some time. Unreviewed (or badly reviewed) code can be extremely dangerous, resulting in huge risks and unpredictable behavior. A survey says that, on average, developers spend 45% of their time fixing bugs and technical debt, when they could be developing new features instead. Defining simple guideline files, adopting certain behaviors and setting up repository configurations are steps that can increase manyfold the code review performance (in both time and quality). Using review tools both on server (e.g. Heroku Review Apps) and locally (e.g. linters) can also greatly increase the process’ speed. Creating templates and checklists ensures no step is overlooked or forgotten. The list goes on, but enough spoilers for now. The attendees will learn specific tips, tools, processes and recommended practices that were compiled from research and real-life use cases (both from my experience and from big players like Django, Facebook, Mozilla, etc), along with some survey data that demonstrates why reviewing code is important.
Talk room #2
Want to take your career to the next level? Aren’t sure you’re doing what it takes to grow? In this talk we’ll discuss how the attitudes of curiosity (learning for the sake of learning), bravery (following your fears), dissatisfaction (discontentment as motivation for improvement) and reflection (taking time to absorb what you’ve learned) can help you continually grow and level up.
PyData Track (Round Room)
Ever got excited about a piece of new machine learning research that you saw come out on arXiv or your favorite research lab’s blog hoping it will finally solve that last bit of optimization you need in your own work that will make you the ML superstar of your team? But after spending days trying to get the same results, you end up failing despite having tried everything in the paper including looking through their Github page, contacting the authors, etc. If this sounds familiar, you’re not alone! Everyday researchers and practitioners alike spend countless hours trying to replicate results from new ML research coming out but inevitably lose precious time and compute resources failing to achieve the required results. We’re facing a massive reproducibility crisis in the field of machine learning. There has been a rise in the ease of use of tools to develop machine learning (ML) based solutions, e.g. AutoML and Keras are two of many. At the same time, there are a lot more public datasets available, aimed at socially oriented research. With more people entering the field coming from diverse trainings, it is not necessary that all adhere to rigorous standards of scientific research. This is evidenced by recent calls by the technical research community at conferences like NeurIPS. We see that a lack of reproducibility in ML research will be a key hindrance in meaningful use of R&D resources. There is currently a lack of a comprehensive framework for doing reproducible machine learning. We, as Pythonistas, can do something to help this! Through my own work in this domain and the work of the intern cohort that worked on the Reproducibility in Machine Learning project this summer at the Montreal AI Ethics Institute, let’s talk through some of the social and technical aspects of this problem and how you can take these principles from the talk today and become the superhero of your ML team elevating the quality of the work coming from your team and helping others build on top of your work. We’ll walk through the following principles and apply them to a case study to understand how this simple yet effective mechanism can help address a ton of the issues that we face in the field. Our framework combines existing tooling with policy applied to solution design, data collection, model development, data and model legacy, and deployment performance tracking.
Tutorial (Clipper Room)
This is an open space time and available for group meetings. Please sign up using the 2019 wiki
2019-Nov-17
Start :
16:30
Stop :
16:45
Closing remarks
2019-Nov-17
Start :
16:45
Stop :
18:00
mingle time