Building OpenAI API-Based Java GenAI Applications—A Guide to the Deitel Videos on the O’Reilly Online Learning Subscription Site

Image for the OpenAI APIs we demonstrate

[Estimated reading time for this document: 20 minutes. Estimated time to watch the linked videos and run the Java code: 5 hours. Please share this guide with your friends and colleagues who might find it helpful.]

This comprehensive guide overviews Lesson 19, Building OpenAI API-Based Java Generative AI Applications, from my Java Fundamentals video course on O’Reilly Online Learning. The lesson focuses on building Java apps using OpenAIs generative AI (genAI) APIs and the official openai-java library. This document guides you through my hands-on code examples and provides “Try It” exercises for experimenting with the APIs. You’ll leverage the OpenAI APIs to create intelligent, multimodal apps that understand, generate and manipulate text, code, images, audio and video content.

This guide links you to 34 videos totaling about 4.75 hours in Lesson 19 of our Java Fundamentals video course, in which Paul Deitel presents fully coded Java genAI apps that use the OpenAI APIs to

  • summarize documents
  • determine text’s sentiment (positive, neutral or negative)
  • use vision capabilities to generate accessible image descriptions
  • translate text among spoken languages
  • generate and manipulate Java code
  • extract from text named entities, such as people, places, organizations, dates, times, events, products, …
  • transcribe speech to text
  • synthesize speech from text, using one of OpenAI’s 11 voices and prompts that control style and tone
  • create original images
  • transfer art styles to images via text prompts
  • transfer styles between images
  • generate video closed captions
  • filter inappropriate content
  • generate and remix videos (under development at the time of this writing—uses OpenAI’s recently released Sora 2 API)
  • build agentic AI apps (under development at the time of this writing—uses OpenAI’s recently released AgentKit)

The remaining videos overview concepts and present genAI prompt and coding exercises you can use to dig deeper into the covered topics.

How We Formed This Guide

We converted this document from our corresponding Python version. The initial Python draft was created using five genAIsOpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, Microsoft’s Copilot and Perplexity. We provided each with

  • a detailed prompt and
  • a Chapter 18 draft from our forthcoming Python for Programmers, 2/e product suite.

We asked Claude to summarize the results, and tuned the summary to create the Python version of this guide, then updated it for the Java version of the videos discussed in this document.

Contacting Me with Questions

The OpenAI APIs are evolving rapidly. If you run into problems while working through the examples or find that something has changed, check the Deitel blog or email paul@deitel.com.

Downloading the Code

Go to the Java Fundamentals, 3/e GitHub Repository to get the source code that accompanies the videos referenced in this guide. The OpenAI API examples are located in the examples/ch19 folder.

Suggested Learning Workflow

If you watch the videos, you’ll get a code-example-rich intro to programming with the OpenAI APIs. To learn how to work with various aspects of the OpenAI APIs, I suggest that you:

  • Watch the video for each example.
  • Run the provided Java code.
  • Complete the “Try It” coding challenges.
  • Experiment by creatively combining APIs (e.g., transcribe audio then translate, or generate images with accessibility descriptions).

Key Takeaways

This comprehensive guide and the corresponding videos present practical skills for harnessing the power of OpenAI’s genAI APIs. You’ll:

  • Master OpenAI APIs in Java and perform creative prompt engineering.
  • Build complete, functional, multimodal apps that create and manipulate text, code, images, audio and video.
  • Implement responsible accessibility and content moderation practices.

Caution: GenAIs make mistakes and even “hallucinate.” You should always verify their outputs.

Introduction

This video overviews the required official openai-java library, OpenAI’s fee-based API model, and monitoring and managing API usage costs.

OpenAI APIs

This video overviews the OpenAI APIs and models I’ll demo in this lesson.

  • Video: OpenAI APIs (1m 36s)
  • OpenAI Documentation: API Reference
  • Try It: Browse the OpenAI API documentation and review the API subcategories.
  • Try It: Prompt genAIs for an overview of responsible AI practices.

OpenAI Developer Account and API Key

Here, you’ll learn how to create your OpenAI developer account, generate an API key and securely store it in an environment variable. This required setup step will enable your apps to authenticate with OpenAI so they can make API calls. You’ll understand best practices for securing your API key. The OpenAI API is a paid service. If, for the moment, you do not want to code with paid APIs, reading this document, watching the videos and reading the code is still valuable.

Text Generation Via the Responses API

My text-generation examples introduce the Responses API, OpenAI’s primary text-generation interface. I show how to structure prompts, configure parameters, invoke the API and interpret responses. This API enables sophisticated conversational AI applications and is the foundation for many text-based genAI tasks.

Text Summarization

In this lengthy video, I provide the foundation you’ll need to work with openai-java in the subsequent examples. I use OpenAI’s natural language understanding capabilities to condense documents into concise summaries. I discuss crafting summarization prompts to control summary size and style. Text summarization is invaluable for efficiently processing large documents, articles and reports.

Sentiment Analysis

This example uses OpenAI’s natural language understanding capabilities to analyze a text’s emotional tone and sentiment. It classifies text as positive, neutral or negative and asks the model to explain how it came to that conclusion.

  • Video: Sentiment Analysis (6m 48s)
  • Try It: Build a sentiment analyzer that classifies the sentiment of customer reviews and asks the genAI model to provide a confidence score from 0.0 to 1.0 for each, indicating the likelihood that the classification is correct. Confidence scores closer to 1.0 are more likely to be correct.

Vision: Accessible Image Descriptions

Here, I use OpenAI’s vision capabilities to generate brief and detailed image descriptions, making images accessible to users who are blind or have low vision.

Try It: Create an application that takes URLs for various images and generates both brief and comprehensive accessible descriptions suitable for screen readers.

Language Detection and Translation

In this example, I use OpenAI’s multilingual capabilities to autodetect the language text is written in and translate text to other spoken languages.

Code Generation

Discover how AI can generate, explain, and debug code across multiple programming languages. The first video covers code generation, understanding AI-generated code quality, and using AI as a coding assistant. In the second video, I discuss how genAIs can assist you with coding, including code generation, testing, debugging, documenting, refactoring, performance tuning, security and more.

Named Entity Recognition (NER) and Structured Outputs

In this example, I use OpenAI’s natural language understanding capabilities and named entity recognition to extract structured information from unstructured text, identifying entities such as people, places, organizations, dates, times, events, products, and more. The example shows that OpenAI’s APIs can return outputs as formatted, human-and-computer readable JSON (JavaScript Object Notation). NER is essential for building applications that process and organize information from documents and text sources.

  • Video: Named Entity Recognition (NER) and Structured Outputs (19m 50s)
  • Video: NER and Structured Outputs: Code and Prompt Exercises (5m 22s)
  • OpenAI Documentation: Structured Model Outputs Guide
  • Try It: Modify the NER example to perform parts-of-speech (POS) tagging—identifying each word’s part of speech (e.g., noun, verb, adjective, etc.) in a sentence. Use genAIs to research the commonly used tag sets for POS tagging. Prompt the model to return a structured JSON response with the parts of speech for the words in the supplied text. Display each word with its part of speech. Use these record classes:
    public record PartOfSpeech(String text, String part) {}
    public record PartsOfSpeech(List parts) {}
  • Try ItModify the NER example to translate text into multiple languages and display the results Prompt the model to translate the text it receives to the specified languages and to return only JSON-structured data in the following format:
    {
       "original_text": original_text_string,
    "original_language": original_text_language_code, "translations": [ {
    "language": translated_text_language_code,
    "translation": translated_text_string } ] }
  • Try It: Create an NER tool that extracts and displays key entities from news articles.

Speech Recognition and Speech Synthesis

This video introduces speechtotext transcription and texttospeech conversion (speech synthesis) concepts for working with audio input and output in your apps. You’ll understand the models used in the transcription and synthesis examples, and explore the speech voices via OpenAI’s voice demo site—https://openai.fm.

English Speech-to-Text (STT) for Audio Transcription

In this example, I convert spoken audio to text. Speech-to-text technology enables applications like automated transcription services, voice commands, and accessibility features.

Text-To-Speech (TTS)

In this example, I convert written text into natural-sounding speech with one of OpenAI’s 11 voice options. I discuss selecting voice options, specifying speech style and tone, and generating audio files. Text-to-speech technology is crucial for creating voice assistants, audiobook generation, and accessibility applications.

Image Generation

Here, I create original images from text descriptions using OpenAI’s latest image-generation model. Image generation opens possibilities for creative content, design mockups, and visual storytelling.

Image Style Transfer

In two examples, I apply artistic styles to existing images using the Images APIs edit capability with style-transfer prompts and the Responses API’s image generation tool to transfer the style of one image to another.

Generating Closed Captions from a Video’s Audio Track

In this example, I generate closed captions from a video file’s audio track using OpenAI’s audio transcription capabilities. Closed captions enhance video accessibility and improve content searchability. This example covers caption formatting standards, audio extraction techniques and using the OpenAI Whisper-1 model, which supports generating captions with timestamps. I then use the cross-platform VLC Media Player to overlay the closed captions on the corresponding video.

Content Moderation

Here, I use OpenAI’s Moderation APIs to detect and filter inappropriate or harmful text and images—essential techniques for platforms hosting user-generated content. Paul presents moderation categories and severity levels, demonstrates the Moderation API with text inputs and discusses image moderation.

Sora 2 Video Generation

Coming soon: This video introduces OpenAI’s recently Video API. I use the Sora 2 model in prompt-to-video, image-to-video and video remixing examples. I will add these videos based to the lesson as soon as I complete them.

  • Video: Coming soon.
  • OpenAI Documentation: Video Generation with Sora Guide
  • OpenAI Documentation: Videos API
  • Try It: Experiment with text-to-video prompts and explore the creative possibilities of AI video generation.

Closing Note

As I develop additional OpenAI API-based apps, I will add new videos to this Building API-Based Java GenAI Applications Java Fundamentals lesson. Some new example possibilities include:

  • Generating and remixing videos with OpenAI’s Sora 2 API.
  • Using OpenAI’s Realtime Audio APIs for speech-to-speech apps.
  • Building AI agents with OpenAI’s AgentKit.
  • Single-tool AI agents.
  • Multi-tool AI agents.
  • Single-agent applications.
  • Multi-agent applications.
  • Managing AI conversations that maintain state between Responses API calls.

Try It: Review the course materials and start planning your own GenAI applications using the techniques learned. Enjoy!

Additional Resources

Are You Just Getting Started in Java Programming?

Are you just getting started with Java How to Program, 11/e, Early Objects versionJava 9 for Programmers or Java How to Program, 11/e, Late Objects version? You will need to install the Java Development Kit (JDK).

Getting the JDK

Updated January 11, 2021

As of this writing, the Java 15 is the current version and new versions are being released every 6 months—Java 16 is coming in March. For organizations interested in stable versions of Java with long-term support (LTS), these will be released every three years. The current LTS version is Java 11 (September 2018). The next LTS version will be Java 17 in September 2021. 

Oracle, Inc.—Java’s gatekeeper—offers the JDK for download from oracle.com, but Oracle recently changed their licensing terms. Their JDK is meant primarily for corporate users

For learning purposes, we recommend that you get your JDK from AdoptOpenJDK.net. Always read the software licenses for any software you install.

Once you’ve downloaded the installer for your operating system platform and the version of Java you intend to use, be sure to carefully follow the installation instructions for your platform (found further down the page).

Java FX for Graphical User Interfaces

Since Java 11, the graphical user interface (GUI) library we use in our Java books—Java FX—is no longer distributed as part of the Java Development Kit.

To run the first example in Chapter 1 and the examples in our later Java FX chapters, you’ll first need to install the Java FX Software Development Kit (SDK).

The Java FX SDK installation instructions are at https://openjfx.io/openjfx-docs/. You can download the JavaFX SDK from https://gluonhq.com/products/javafx/.

Be sure to download the version that matches your JDK version number and your platform and closely follow the installation instructions.

If you’re unsure what to download, please send us an email. You’ll need to set your PATH_TO_FX Environment Variable. This depends on where you place the SDK’s folder on your system and what version of the SDK you have. The samples below assume the Java FX SDK’s folder is in your user account’s Downloads folder. In the paths I show below, you need to replace

     “/Users/pauldeitel/Downloads/javafx-sdk-15.0.1”

or

     “c:\Users\pauldeitel\Downloads\ javafx-sdk-15.0.1”

with the correct full path on your system and the JavaFX SDK version number for the specific version you downloaded.

Mac/Linux:

     export PATH_TO_FX=/Users/pauldeitel/Downloads/javafx-sdk-15.0.1/lib

Windows:

     set PATH_TO_FX="c:\Users\pauldeitel\Downloads\javafx-sdk-15.0.1/lib"

Compiling and Running the Painter App in Chapter 1

To compile the Painter app in Chapter 1 use the following command in your Command Prompt (Windows), Terminal (macOS or Linux) or shell (Linux)—Windows users should replace $PATH_TO_FX with %PATH_TO_FX%

     javac --module-path $PATH_TO_FX --add-modules=javafx.controls,javafx.graphics,javafx.fxml *.java

To run the Painter app, use the following command—Windows users should replace $PATH_TO_FX with %PATH_TO_FX%

     java --module-path $PATH_TO_FX --add-modules=javafx.controls,javafx.graphics,javafx.fxml Painter

If you’re having any trouble at all, please send us an email. We’re happy to help you get up and running!

Pin It on Pinterest