The largest open-source dataset of car designs, including their aerodynamics
Open-source aerodynamics tools and methods
- Multiple commenters recommend open-source tools for aircraft/RC/flying-wing design:
- CFD: OpenFOAM (powerful but often overkill for hobby use).
- 2D airfoils: XFOIL.
- 3D / vortex lattice / panel methods: AVL, VSP Aero, XFLR5, FreeWake, Datcom.
- Higher-level toolkit: AeroSandbox (GitHub).
- Consensus: for “normal-ish” designs, panel / potential-flow models are much easier and faster than full CFD, and good enough for performance and stability estimates.
- Emphasis that some aeronautics background is needed to interpret results and understand limitations.
Dataset scope, content, and access
- Data is hosted on Harvard Dataverse and referenced GitHub; also mirrored at caemldatasets.org without access restrictions.
- Dataset size is reported as “a few hundred gigabytes.”
- Clarification that the dataset consists of parametric, randomized car-like shapes derived from a template, not CAD models of real production cars.
- One linked paper describes it as a large multimodal car dataset with CFD simulations and deep learning benchmarks; practical downstream uses are not deeply discussed in the thread.
Licensing and “open source” controversy
- Dataset is licensed under Creative Commons Attribution–NonCommercial (CC BY‑NC 4.0).
- Several commenters argue that:
- Non-commercial licenses are not “open source” under established definitions.
- Calling it “open source” is misleading and waters down important legal concepts.
- Others note:
- CC BY‑NC still allows broad access, but restricts commercialization of the dataset.
- There is ambiguity about whether using the data inside a commercial workflow (e.g., training models, designing a car) would violate “NC”; interpretations differ and remain unclear.
Car design, modularity, and aesthetics
- Some lament that aerodynamics + regulation push all cars toward similar “blobs.”
- Others welcome standardization and shared parts for cost and repair benefits, but note manufacturers have incentives to avoid interchangeable components to protect margins.
- There is a tension between:
- Desire for cheap, robust, standardized vehicles.
- Desire for emotionally appealing, varied designs; several commenters feel current design trends are in a “dark age.”
EV size, weight, and efficiency debates
- Extensive side discussion on why many modern EVs are heavy crossovers/SUVs:
- Large, heavy battery packs to meet customer range expectations.
- Consumer preference and profit margins for SUVs and crossovers.
- Safety, crash standards, and “arms race” dynamics with larger vehicles.
- Energy-density arguments:
- One side cites ≈100× higher gravimetric energy density of gasoline vs batteries.
- Another points out EVs’ higher drivetrain efficiency and regenerative braking, reducing the effective gap to around 5× in practice for usable energy.
- Disagreement on whether that numerical refinement is “pedantic” or materially important.
- Some argue ICE vehicles remain more mature and cost-effective; others claim certain EVs (e.g., mid-market crossovers) can already be more cost-effective than comparable ICE models.
- Electric aircraft are mentioned as an example where battery weight and inability to “burn off” mass severely limit viability; landing-weight constraints exacerbate this.
Future EV architectures and packaging
- Several comments note that many current EVs are adapted from ICE platforms, limiting efficiency and packaging gains.
- Others highlight a shift toward EV-native platforms:
- Examples cited (Tesla, Hyundai, etc.) and quotes from an automaker CEO promising “super-efficient platforms” with small exterior size but larger interior volume, and sub‑$40k or even sub‑$30k targets.
- Suggestion that truly compact, efficient, family-friendly EVs may emerge within a few years.
Questions about aerodynamic extremes
- A commenter asks how many of the ~8,000 shapes achieve very low drag coefficients (Cd < 0.20), citing a historical EV with Cd 0.19.
- The thread does not provide an answer; distribution of Cd values in the dataset remains unclear.