Build and Program FPGA-Based Designs Quickly with Python and Jupyter Notebooks

By Stephen Evanczuk

Contributed By Digi-Key's North American Editors

Designers have traditionally turned to field programmable gate arrays (FPGAs) to accelerate performance in hardware designs for compute-intensive applications such as computer vision, communications, industrial embedded systems, and increasingly the Internet of Things (IoT). Yet, the detailed steps involved in conventional FPGA programming have been prohibitive, encouraging designers to seek alternative processing solutions, until now.

The emergence of the Python Productivity for Zynq (PYNQ) development environment based on Jupyter notebooks addresses the issue of FPGA programmability. Using a development board designed specifically to support PYNQ, developers with little FPGA experience can rapidly implement designs able to take full advantage of FPGA performance for speeding compute-intensive applications.

This article will describe the typical FPGA approach before introducing and showing how to get started on a development board from Digilent that provides a powerful open source alternative for speeding development of FPGA-based systems.

Why FPGAs?

Engineers who need to employ complex, compute-intensive algorithms often rely on FPGAs to accelerate execution without compromising tight power budgets. In fact, FPGAs have emerged as a dominant platform for speeding artificial intelligence algorithms in edge-computing systems (see, “Use FPGAs to Build High-Performance Embedded Vision Applications with Machine Learning”).

Designed specifically for embedded applications, more advanced FPGA system-on-chip (SoC) devices integrate programmable logic (PL) fabric with a microcontroller. For example, the Xilinx Zynq-7000 SoC combines a dual-core Arm® Cortex®-A9 processor system with up to 444,000 logic cells in its integrated programmable logic (PL) fabric (Figure 1). Along with its built-in processors and an extensive complement of peripherals, the Zynq SoC offers up to 2,020 digital signal processing (DSP) blocks, or slices. Using these resources, developers can configure the PL fabric into specialized processing chains needed to speed throughput in complex compute-intensive algorithms.

Diagram of Xilinx Zynq-7000 SoC

Figure 1: The Xilinx Zynq-7000 SoC combines a dual-core Arm Cortex-A9 processor, programmable logic fabric, and an extensive set of peripherals and interfaces needed in many embedded applications. (Image source: Xilinx)

Besides reducing parts count, integration of the processors and PL fabric allows operations to occur across on-chip buses rather than through off-chip access. This integration further simplifies the critical task of loading the PL fabric during power-on or reset sequences.

In a typical microcontroller-based system built with an FPGA, developers needed to manage the sequence and security for loading bitstreams that program the FPGA. With the Zynq SoC, an integrated processor performs a conventional microcontroller’s tasks, including managing the PL fabric and other on-chip peripherals. As a result, the FPGA loading process more closely resembles that of a conventional microcontroller’s boot process than a traditional FPGA bitstream initialization.

This boot process occurs through a short sequence of steps managed by one of the Zynq’s processors (Figure 2). At power-on or reset, the boot process starts when a Zynq processor executes a short piece of code from its read-only BootROM to fetch the actual boot code from a boot device. Along with code for configuring the processor system components, the boot code includes the PL bitstream as well as the user application. When boot code loading completes, the processor uses the included bitstream to configure the PL. After configuration and PL configuration are completed, the device begins executing the application included in the boot code.

Image of Xilinx Zynq-7000 SoC  boot sequence

Figure 2: In a boot sequence similar to conventional microcontrollers, a Xilinx Zynq-7000 SoC runs code from Boot ROM that loads and executes the boot loader, which handles subsequent stages including using a bitstream packaged in the boot code to configure the programmable logic fabric. (Image source: Xilinx)

Even with simplified PL loading processing, developers have in the past been left to deal with the complex FPGA development process needed to generate the required bitstreams. For developers hoping to leverage FPGA performance, the conventional FGPA development process has remained a significant barrier to implementation. Xilinx effectively removed that barrier with its PYNQ environment.

PYNQ environment

In PYNQ, PL bitstreams are encapsulated in pre-built libraries called overlays, which serve a similar role as software libraries in the development process and execution environment. During the boot load process, bitstreams associated with the required overlays configure the PL fabric. However, this process remains transparent to developers who take advantage of the overlay’s functionality through the Python application programming interface (API) associated with each overlay. During development, engineers can combine software libraries and overlays as needed, working through their respective APIs to implement the application. During execution the processor system executes software library code as usual, while the PL fabric implements the functionality provided in the overlay. The result is the kind of accelerated performance that continues to drive interest in FPGA-based designs for increasingly demanding applications.

As the name suggests, PYNQ takes advantage of the development productivity gains associated with the Python programming language. Python has emerged as one of the top languages not only because of its relative simplicity, but also because of its large and growing ecosystem. Developers are likely to find software libraries needed for support services or specialized algorithms in repositories of open source Python modules. At the same time, developers can implement critical functions in C language because PYNQ uses the common C language implementation of the Python interpreter. This implementation provides easy access to thousands of existing C libraries and simplifies use of developer provided C language libraries. Although experienced developers can extend PYNQ with specialized hardware overlays and C language software libraries, PYNQ’s strength lies in its ability to provide a high productivity development environment for any developer able to build a Python program.

Itself an open source project, PYNQ builds on another open source project, the Jupyter notebook. Jupyter notebooks provide a particularly effective environment for interactively exploring algorithms and prototyping complex applications in Python or any of the other supported programming languages, currently numbering over 40. Developed through community consensus under Project Jupyter, a Jupyter notebook combines lines of executable code with descriptive text and graphics. This capability allows individual developers to more effectively document their progress without moving to another development environment. For example, a developer can use a notebook that combines a few lines of code needed to view the data with the graphic generated by the code (Figure 3).

Image of Jupyter notebook from a Xilinx sample repository

Figure 3: A Jupyter notebook from a Xilinx sample repository combines descriptive text, executable code, and an output associated with an application. (Image source: Xilinx)

The ability to contain code, output, and descriptive text is possible because a Jupyter notebook is a live document maintained in an interactive development environment provided by a Jupyter notebook server (Figure 4). In a Jupyter session, the server renders the notebook file in a conventional Web browser using HTTP, and a combination of HTTP and Websockets protocols for the static and dynamic content in the rendered document. On the back end, the server communicates with a code execution kernel using the open source ZeroMQ (ØMQ) messaging protocol.

Diagram of Jupyter session workflow

Figure 4: In a Jupyter session, a notebook server renders the contents of a notebook file to a Web browser while interacting with a backend kernel that executes the code. (Image source: Project Jupyter)

In edit mode, the user can modify the text and code. In turn, the server updates the corresponding notebook file, which is a text file comprising a series of JSON key/value pairs. These pairs are called cells in the Jupyter environment. For example, the Web browser display of the Jupyter notebook shown earlier comprises a few cells for code and markdown text (Listing 1).

   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Error plot with Matplotlib\n",
    "This example shows plots in notebook (rather than in separate window)."
  },  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   "outputs": [
     "data": {
      "image/png": "iVBORw0KGgoAAAA[truncated]",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x2f85ef50>"
     "metadata": {},
     "output_type": "display_data"
   "source": [
    "%matplotlib inline\n",
    "    \n",
    "X = np.arange(len(values))\n",
    " + 0.0, values, facecolor='blue', \n",
    "        edgecolor='white', width=0.5, label=\"Written_to_DAC\")\n",
    " + 0.25, samples, facecolor='red', \n",
    "        edgecolor='white', width=0.5, label=\"Read_from_ADC\")\n",
    "plt.title('DAC-ADC Linearity')\n",
    "plt.legend(loc='upper left', frameon=False)\n",

Listing 1: A Jupyter notebook is a text file containing a series of JSON key/value pairs containing code sections, markup, and output such as these, which correspond to the rendered page shown in Figure 3. Note that the string corresponding to the .png image in that figure has been truncated here for presentation purposes. (Code source: Xilinx)

Aside from its documentation features, the power of the Jupyter environment lies in its ability to interactively execute code cells. Developers simply select the cell of interest in their browser (blue border in Figure 3) and click the run button in the Jupyter menu on the top of their browser window. In turn, the Jupyter notebook server hands off the corresponding code cell to a code execution kernel, which is the interactive Python (IPython) kernel in the PYNQ environment. Following code execution, the server asynchronously updates both the rendered Web page and notebook file with any output generated by the kernel.

PYNQ extends this same approach to FPGA-based development by embedding the Jupyter framework including IPython kernel and notebook Web server on the Zynq SoC’s Arm processors. The pynq Python module included in the environment provides programmers with the Python API needed to access PYNQ services in Python programs.

FPGA development environment

Designed specifically to support PYNQ, the Digilent PYNQ-Z1 development kit lets developers quickly begin exploring FPGA-accelerated applications simply by loading the available PYNQ bootable Linux image. The PYNQ-Z1 board combines a Xilinx XC7Z020 Zynq SoC with 512 megabytes (Mbytes) of RAM, 16 Mbytes of flash, and a microSD slot for additional external flash memory. Along with switches, buttons, LEDs, and multiple input/output ports, the board also provides connectors for expansion to third-party hardware through the Digilent Pmod (peripheral module) interface and through Arduino shields and Digilent chipKIT shields. The board also brings out the Zynq SoC’s analog-to-digital converter (ADC), called the XADC, as six single-ended analog input ports or four differential analog input ports. Digilent also supplies the separate PYNQ-Z1 productivity kit that includes a power supply, a micro USB cable, a microSD card preloaded with a PYNQ image, and an Ethernet cable to update or add Python modules.

For the developer, the full capabilities of the SoC and board are readily available through a Jupyter notebook. For example, accessing the board’s Pmod interface to read the ADC and write digital-to-analog converter (DAC) values in a loopback test requires only a few lines of code (Figure 5). After importing the required Python modules, the SoC PL is initialized with a “base” overlay (cell two in Figure 5). Like a conventional board support package, this base overlay provides access to board peripherals.

Image of Jupyter notebook included in the Xilinx sample repository

Figure 5: A Jupyter notebook included in the Xilinx sample repository demonstrates the simple design pattern associated with accessing hardware services for input/output transactions. (Image source: Xilinx)

Developers need only call imported modules to read and write the values (cell three in the Figure). In the sample notebook shown, the notebook server issues each cell in sequence and updates the notebook with generated results. In this case, the only output value is 0.3418, but any execution errors will show up as normal Python traceback stacks in line with their respective code cells.

Building complex applications

Combined with the wide array of available Python modules, this deceptively simple approach to embedded applications development masks a potent platform for rapidly implementing complex, compute-intensive applications. For example, developers can quickly implement a face detection webcam using the PYNQ-Z1 HDMI input and popular OpenCV computer vision library. After loading the base overlay and webcam interface, developers initialize an OpenCV camera object videoIn (Figure 6). Reading the video image is then as simple as a call to, returning frame_vga in this example.

Image of Jupyter notebook included in the Xilinx sample repository

Figure 6: A Jupyter notebook included in the Xilinx sample repository shows how developers can quickly build a webcam face recognition system by combining hardware resources of the PYNQ-Z1 development board with powerful image processing functions available in the OpenCV library (cv2). (Image source: Xilinx)

In a subsequent step, managed as a separate cell in the notebook, developers create OpenCV (cv2) classifier objects using preset criteria and add bounding boxes to identify features (green for eyes and blue for faces in this example). In another pair of cells, the application completes after displaying the output using the board’s HDMI output (Figure 7).

Image of final cells in the Xilinx webcam face detection notebook

Figure 7: The final cells in the Xilinx webcam face detection notebook demonstrate use of OpenCV classifiers, whose results are used to add bounding boxes to the original images and displayed using the PYNQ-Z1 development board’s HDMI output port. (Image source: Xilinx)

The ability to interactively build, test, and share discussion about complex software has made Jupyter notebooks a favorite among scientists and engineers working to optimize algorithms for artificial intelligence applications. As the work evolves, the notebook not only shows code and its output, but also the developers’ analysis about the results, providing a kind of computational narrative that can be shared among team members and colleagues.

Yet, developers need to understand that notebooks may be unlikely repositories for more production oriented efforts. For example, their inclusion of large hexadecimal encoded strings for image data (see truncated section in Listing 1) not only increases document size, but can complicate difference methods used by typical source version control systems. The interlacing of code and non-functional text can further complicate migration of code created in early analytical stages to production level development processes. For code exploration and rapid prototyping, however, Jupyter notebooks offer a powerful development environment.


FPGAs provide a necessary performance boost needed to meet the increasing demands of embedded systems designed for the IoT, computer vision, industrial automation, automotive, and many more. Although conventional FPGA development methodologies have remained obstacles for many developers, the emergence of the Python-based PYNQ development environment based on Jupyter notebooks offers an effective alternative. Using a development board designed specifically to support PYNQ, developers with little FPGA experience can rapidly implement designs able to take full advantage of FPGA performance for speeding compute-intensive applications.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

About this author

Stephen Evanczuk

Stephen Evanczuk has more than 20 years of experience writing for and about the electronics industry on a wide range of topics including hardware, software, systems, and applications including the IoT. He received his Ph.D. in neuroscience on neuronal networks and worked in the aerospace industry on massively distributed secure systems and algorithm acceleration methods. Currently, when he's not writing articles on technology and engineering, he's working on applications of deep learning to recognition and recommendation systems.

About this publisher

Digi-Key's North American Editors