Files
fennec/planning/LANGUAGE_PROCESSING.md
2025-08-05 16:14:00 -04:00

7.9 KiB
Raw Blame History

Language Processing Library (langproc)

Table of Contents

Introduction

This library contains implementations of headers and classes related to processing
languages. This includes; ascii/utf8/utf16 string processing, file formats, machine language,
and programming languages.

fennec should be able to process documentation in files, the main ways it will support
this is through Doxygen and LaTeX. Consider including binaries with releases.

String Analysis (langproc/strings)

fennec reimplements the C++ Strings Library as a submodule of this library. This
is because C++ std::string has a lot of overhead. I would say that std::string
is a Jeep, while fennec::string is an F2 Car, if that analogy makes any sense. i.e.
std::string offers a lot of use cases, but is slower, while an F2 Car is barebones and
highly performant on the right surface.

Implementation

Symbol Implemented Passed
cstring
string
wcstring
wstring

File System (filesystem)

fennec does not reimplement the C++ I/O Library. What it does do
is create C++ classes that handle file streams, directory streams, and file paths.

Implementation

Symbol Implemented Passed
path
file
directory

Interpreter (langproc/interpret)

This submodule will contain classes for interpreting data, particularly
through parsers. We will need to be able to do the following things to achieve
support for files that adhere to a certain specification. Here are some concepts
that will need to be implemented as classes:

Reading

  • Tokenization
    • Useful for text-based formats
  • Data Parser
    • Useful for binary-based formats
  • Lexical Analysis
    • Necessary for Syntax Coloring
  • Syntax Analysis
    • Necessary for Syntax Coloring
  • Semantic Analysis
    • Necessary for Code Completion
  • Intermediate Code Generation
    • Necessary for any custom programming language in fennec
  • Target Code Generation / Optimization?
    • Necessary for any custom programming language that needs to compile to binary

Writing

The writers will be responsible for writing data as a specific format. I.E. converting
data values (e.g. floats, ints, etc.) to a readable language (e.g. ascii/utf8/utf16).

  • Writer
  • Binary Writer

Formats (langproc/formats)

This submodule will contain classes for processing a variety of file formats.

Serialization

Symbol Implemented Passed
JSON
HTML
XML
YAML

Configuration

Symbol Implemented Passed
INI
TOML

Documents

Symbol Implemented Passed
ODF
Markdown
PDF

Spreadsheets & Tables

Symbol Implemented Passed
ODS
CSV

Audio Formats

Symbol Implemented Passed
MP3
WAV
AAC

Graphics Formats

Raster Textures

Symbol Implemented Passed
BMP
DDS
JPG
PNG
TIFF

Vector Graphics

Symbol Implemented Passed
OTF
SVG
TTF

3D Model Formats

unfortunately, most formats are esoteric due to copyright/trademark/etc.
I will be using assimp for the time being, below is a list of formats supported
by assimp.

Symbol Implemented Passed
3D
3DS
3MF
AC
AC3D
ACC
AMJ
ASE
ASK
B3D
BVH
CSM
COB
DAE/Collada
DXF
ENFF
FBX
glTF 1.0 + GLB
glTF 2.0
HMB
IFC-STEP
IQM
IRR / IRRMESH
LWO
LWS
LXO
M3D
MD2
MD3
MD5
MDC
MDL
MESH / MESH.XML
MOT
MS3D
NDO
NFF
OBJ
OFF
OGEX
PLY
PMX
PRJ
Q3O
Q3S
RAW
SCN
SIB
SMD
STP
STL
TER
UC
USD
VTA
X
X3D
XGL
ZGL

Video Formats

Symbol Implemented Passed
MP4
AVI
MPG
MOV